Content uploaded by Wang Yuhui
Author content
All content in this area was uploaded by Wang Yuhui on Jan 04, 2020
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=hihc20
International Journal of Human–Computer Interaction
ISSN: 1044-7318 (Print) 1532-7590 (Online) Journal homepage: https://www.tandfonline.com/loi/hihc20
Chinese System Usability Scale: Translation,
Revision, Psychological Measurement
Yuhui Wang, Tian Lei & Xinxiong Liu
To cite this article: Yuhui Wang, Tian Lei & Xinxiong Liu (2019): Chinese System Usability Scale:
Translation, Revision, Psychological Measurement, International Journal of Human–Computer
Interaction, DOI: 10.1080/10447318.2019.1700644
To link to this article: https://doi.org/10.1080/10447318.2019.1700644
Published online: 10 Dec 2019.
Submit your article to this journal
Article views: 75
View related articles
View Crossmark data
Chinese System Usability Scale: Translation, Revision, Psychological Measurement
Yuhui Wang
a
, Tian Lei
b
, and Xinxiong Liu
a
a
Industrial Design Department, Huazhong University of Science and Technology, Wuhan, China;
b
School of Mechanical Science and Engineering,
Huazhong University of Science and Technology, Wuhan, China
ABSTRACT
The Chinese version of the system usability scale (SUS) was re-translated in this study by the addition of
an interview process plus the modification and selection of strict translation results. The revised
translation is in close accordance with the linguistic usage of Chinese native speakers without any
ambiguity. The revised Chinese version of the psychometric measurement is shown to be reliable,
effective, and sensitive. We also conducted a comparative study within one group to confirm that the
reliability of the cross-cultural adaptation version is higher than that of the original version. The
questionnaire provides a tested tool for Chinese language users to help practitioners complete
usability assessments.
1. Introduction
Usability research may involve inherent, performance-based, or
perceived assessments. Perceived usability is the user’s direct,
self-reported intra-task coherence, efficiency, organization,
user-friendliness, and immediacy of a given system (Mcgee,
Rich, & Dumas, 2004). Perceived usability is tested to judge
the user’s intuitive experience of the system (Park, Han, Kim,
Cho, & Park, 2013; Tullis & Albert, 2008; Vermeeren et al.,
2010). This is very important, because users are more likely to
recommend products to others based on intuitive feelings.
Higher perceived usability can enhance customer loyalty
(Flavián, Guinalíu, & Gurrea, 2006;Sauro,2010).
Measures of perceived usability typically center on subjective
questionnaires (Yang, Linder, & Bolchini, 2012). Usability infor-
mation and ratings can be obtained through questionnaire
scores. Questionnaires used for this purpose include USE,
QUIS, UMUX, PUTQ, SUS, etc.; SUS is the most commonly
used among them (Assila, De Oliveira, & Ezzedine, 2016;Lewis,
2018b). SUS contains only 10 items which can be quickly eval-
uated (Brooke, 1996). Versions of SUS have been widely utilized
to test perceived usability (Brooke, 2013;Lewis,2018b;Sauro&
Lewis, 2009); the questionnaire also provides high reliability
(0.91) and sensitivity (Blažica & Lewis, 2015), is free to use,
and yields good test results with small samples (Tullis &
Stetson, 2004). Everyday products (Kortum & Bangor, 2013),
all manner ofsystems (Konstantina, Nikolaos, & Christos, 2015),
mobile applications (Kortum & Sorber, 2015), and web sites
(Flavián et al., 2006) have been tested for usability by SUS.
SUS is commonly deployed for usability evaluations in China,
from the evaluation of medical equipment (Yan, Wang, Liu, &
Wu, 2012) to health monitoring systems (Jia et al., 2013)toATM
interfaces (Wang & Lv, 2017) and in-vehicle information
systems (Li, Chen, Sha, & Lu, 2017). In these studies, the SUS
was either directly translated or distributed to participants with-
out psychometric evaluation. Scales that are not tested or eval-
uated by psychological measures may lead to inaccurate
measurements. A cross-cultural questionnaire translation may
also impact the results (Finstad, 2006). The SUS version cur-
rently used in mainland China is based on a published Chinese
version of “Quantifying the User Experience”(Sauro & Lewis,
2014). In many unpublished studies and interviews with IT
companies in China, this version was shown to reflect certain
items poorly and represent notable cross-cultural reading differ-
ences. In this study, we re-translated the Chinese SUS version to
improve its adaptability and localization.
2. Related work
2.1. Previous SUS translations
SUS, the most widely used tool for measuring perceived usabil-
ity, should not be limited to those who are fluent in English
(Lewis, 2018b). The original SUS may not be suitable in multi-
cultural environments as non-native Englishspeakers may inter-
pret it differently (Finstad, 2006). To allow for wider usage
without sacrificing consistent evaluation results, translation of
the questionnaire into local languages is essential. The question-
naire can be roughly translated, but the translation must be
accompanied by psychological measurement and localization
to ensure it is properly understood. The SUS has already been
strictly translated into Arabic (Alghannam, Albustan, Al-
Hassan, & Albustan, 2018), Slovene (Blažica & Lewis, 2015),
Polish (Borkowska & Jach, 2017), Portuguese (Martinsa, Rosa,
Queirós, & Silva, 2015), Italian (Borsci, Federici, Bacci, Gnaldi, &
CONTACT Yuhui Wang 346235912@qq.com Room 413, East Building 1, Industrial Design Department, Huazhong University of Science and Technology,
Wuhan, Hubei 430074, China.
Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/hihc.
INTERNATIONAL JOURNAL OF HUMAN–COMPUTER INTERACTION
https://doi.org/10.1080/10447318.2019.1700644
© 2019 Taylor & Francis Group, LLC
Bartolucci, 2015), Malay (Mohamad, Yaacob, & Yaacob, 2018),
and other languages. These translations extend SUS to non-
English-speaking users, and do include effective psychological
measurement techniques.
Compared to the original English version of SUS (Table 1),
certain items are typically modified to accommodate respon-
ders from different cultural backgrounds. In existing SUS
translations, it is common to use words interchangeably with-
out changing their meanings. The word “cumbersome”in
Item 8, for example, can cause confusion in certain translations
and may be better replaced by “awkward”(Bangor, Kortum, &
Miller, 2008; Finstad, 2006). In the Polish translation
(Borkowska & Jach, 2017), “awkward”is replaced with “incon-
venient”. In the Arabic translation (Alghannam et al., 2018), it
is replaced with “strange”. In one Chinese version (Sheu, Fu, &
Shih, 2017), the translation is “a little troublesome to use”.
Item 6 does not readily translate into multiple languages, either
(Borkowska & Jach, 2017).
Psychological measurement questionnaires should gener-
ally include reliability, construct validity, and sensitivity con-
siderations (Sauro & Lewis, 2011). Translations must ensure
appropriate psychological measures. The Cronbach alpha
coefficient of existing translations is generally above 0.8,
which is higher than the lower limit of 0.7 and usually lower
than the English version. Construct validity also must be
tested to determine the consistency of the questionnaire pro-
ject infrastructure. There are two potential factors, ease of
learning and availability, which can be measured for this
purpose; however, the current versions of SUS are not strictly
consistent in this regard, i.e., certain items may be biased.
“Sensitivity”mainly refers to the appropriate response to
a questionnaire when participants’usage frequency or amount
of use with the system has changed (Alghannam et al., 2018;
Blažica & Lewis, 2015). It is usually assessed via t-/F-test
(Alghannam et al., 2018; Bangor et al., 2008). If the signifi-
cance level is below the critical value of 0.05, then the ques-
tionnaire is sensitive. The translated version of the existing
SUS is generally sensitive (Lewis, 2018b). Tests of concurrent
validity have consistently shown that translated versions of
the SUS adequately correlate (r > 0.30) with measures such as
likelihood-to-recommend, CSUQ, PSSUQ, and UMUX-LITE
(Lewis, 2018b).
Although the factor structure of SUS was once considered to
have a two-factor structure of “Learnability”and“Usability”
(Borsci, Federici, & Lauriola, 2009), more recent scholars have
not replicated this finding (Lewis, 2018b). Lewis and Sauro
(2017b) recommended SUS as a dimension of perceived usabil-
ity after completing a large sample analysis.
To sum up, although it is difficult to translate some items
in the process of SUS localization, psychological measure-
ments have proven that the translated versions of various
languages are reliable and suitable for the native language
users. The confidence of the Chinese translation is acceptable.
2.2. Necessity for Chinese re-translation
Usability research began later in China than in its interna-
tional counterparts, but has grown very rapidly in recent years
(Lei, Xu, Meng, Zhang, & Gong, 2014; Wang, 2003) and
continues to produce meaningful results (Gao et al., 2013;Li
& Li, 2011; Liu, 2014; Liu, Zhang, Zhang, & Chen, 2011). SUS,
as discussed above, is the most widely used usability evalua-
tion questionnaire. Though SUS has been directly translated
into Chinese, there is yet a lack of accompanying psychologi-
cal measurements and sizable obstacles to understanding
remain. Users of different cultures may evaluate usability
differently (Herman, 1996; Noiwan & Norcio, 2006; Rajanen
et al., 2017). Several unpublished studies contain issues with
the understanding of items on the current Chinese SUS ver-
sion; in other words, the questionnaire does not apply without
modifications to localize it. Such modifications are lacking
from several key versions of the questionnaire that have
been directly translated (Sauro & Lewis, 2014;
Sheu et al., 2017; Tullis & Albert, 2008).
User testing is necessary after forward-back translation to
judge questionnaire efficacy, as translation experts are usually
not users. This link is very important, as it can reveal “hid-
den”issues with the translation, help participants better
understand the questionnaire, and provide an important
reference for subsequent revision of the translated version.
Sharfina and Santoso (2017) also pointed out that the literal
translation of an original version is insufficient without
further cultural adjustments. Blažica and Lewis (2015),
Sharfina and Santoso (2017), and Mohamad et al. (2018)
reached similar conclusions. The Chinese SUS version cur-
rently in use is a temporary translation that has not been
localized. Certain items are open to misunderstanding even
after the forward-back translation is complete.
Table 1. Forward translation results; item 8 contains two translations.
The original version of SUS Forward translation results
1. I think that I would like to use this system frequently. 1 我想我会经常使用这个系统.
2. I found the system unnecessarily complex. 2. 我发现该系统过于复杂.
3. I thought the system was easy to use. 3. 我认为该系统易于使用.
4. I think that I would need the support of a technical person to be able to use this system. 4. 我认为我需要技术人员的帮助才能使用这个系统.
5. I found the various functions in this system were well integrated. 5. 我发现这个系统很好地集成了各种功能.
6. I thought there was too much inconsistency in this system. 6. 我认为这个系统中存在大量的不一致.
7. I would imagine that most people would learn to use this system very quickly. 7. 我想大多数用户能很快学会使用该系统.
8. I found the system very awkward to use. 8. 我发现这个系统使用起来很别扭.
我发现这个系统使用起来很麻烦.
9. I felt very confident using the system. 9. 我使用这个系统时,感到很有信心.
10. I needed to learn a lot of things before I could get going with this system. 10. 在使用这个系统之前,我需要学习很多相关知识.
2Y. WANG ET AL.
2.3. Cross-cultural adaptation and methods
Cultural differences can affect the accuracy of instrument
translation. We cannot assume that a particular concept has
the same relevance across various cultures. Intercultural trans-
lation questionnaires are rife with disparity among spoken
phrases, word clarity, and word meaning. Simple, verbatim
translation does not sufficiently reflect cultural and linguistic
differences (Hilton & Skrutkowski, 2002).
Construct bias, method bias, and item bias are known to
be caused by cultural differences in cross-cultural research
on various instruments. Among them, construct bias and
item bias are the factors most significantly affecting instru-
ment translation quality (Van de vijver & Leung, 1997;Van
de vijver & Poortinga, 1997). “Construct bias”refers to the
inconsistency of a group’s internal construct of a concept
(Van de Vijver & Rothmann, 2004). The emergence of such
bias may be related to the different cultural backgrounds of
different groups, but the current research on SUS transla-
tion does not account for this. Lewis and Sauro (2017b)
studied 9156 samples and posited only one factor structure
for SUS, but did not report whether the samples were from
English-speaking countries or countries with other cultural
backgrounds.
Because perceived usability testing has some commonly
used specifications and testing methods (Lewis, 2006; Rubin
& Chisnell, 2008), and the measurement objects do not
change due to cross-cultural differences, there is almost no
methodological bias in SUS. “Item bias”refers to the fact that
individuals of various cultural groups do not score the same
single item in the same way (Shepard, Camilli, & Averill,
1981). The occurrence of item deviation is closely related to
a certain culture or cultural specifics in the content of the
unfamiliar item (Vijver & Tanzer, 2004). For example, Finstad
(2006) found that non-native English speakers do not readily
understand “cumbersome”but are more familiar with
“awkward”.
In the process of actual translation, certain instruments
must be cross-culturally adapted to mitigate the bias caused
by cultural differences. Ideally, a preliminary qualitative study
with participants from the target culture will be completed.
The item needs to be understood in parallel terminology in
the target language and the phrasing must be fully clear
(Hilton & Skrutkowski, 2002). In the process of translating
the CSUQ into Turkish (Erdinç & Lewis, 2013), the translator
assessed the semantics, idioms, and concepts of the items
while altering the expression of certain verbs and forms.
They modified a total of 14 items. In a translation of the
PSSUQ into Portuguese (Rosa et al., 2015), the translator
also carried out a cultural adaptation process to ensure the
Portuguese and English versions were equal in semantics and
content.
Pretests or interviews with specific groups can also be useful
in terms of translation accuracy (Dianat, Ghanbari, &
Asgharijafarabadi, 2014; Sharfina & Santoso, 2017). Other ques-
tionnaires have been treated similarly in addition to usability
assessment questionnaires. For example, in the process of trans-
lating the DASH 17+ questionnaire (Cardoso & Capellini, 2018),
the translators altered two options, completed a pretest, asked
the students to report their understanding of the questionnaire,
and asked the participants themselves to help modify the
sentences.
Pretests or interviews can also be used to optimize the
expert translations of other scales to Chinese versions. Chen,
Hao, Feng, Zhang, and Huang (2011) selected 20 families for
apretest on expert translations of the PEDSQL survey; the
participants gave feedback including suggestions to revise the
scale translations. Pei, Xia, and Yan (2010) completed
a pretest via short interview with 15 participants in translating
FABQ to Chinese. In the process of translating the Hagos
score (Cao et al., 2018), 20 participants also completed
a pretest on versions translated by experts. After absorbing
their feedback, they formed the final Chinese version.
In these studies, pretesting was completed through inter-
views with user groups to optimize expert translations. The
purpose of this was to eliminate or reduce the bias caused by
different levels of knowledge between the measured group
and the translation team. However, in various language
versions of SUS –apart from the Indonesian and Persian
versions –no other team has completed a pretest in their
translation process.
In this study, similar to the published version, we also
adopted a forward-back SUS translation process. To better
suit the Chinese native speaking participants. We added
a structured interview; participants were asked to report
the obstacles they encountered in reading the translated
version and we adjusted the translation according to their
feedback. After the back-interview, the participants finally
choseatranslationresultthatwasbuiltintotheformal
version of the translation.
3. Translation methodology design
The forward-back translation method we used in this study
was also used by Alghannam et al. (2018), Blažica and Lewis
(2015), Borkowska and Jach (2017), and Sharfina and
Santoso (2017).
3.1. Forward translation
We used forward translation to put the original English SUS
into Chinese. We adopted the Finstad (2006) original version
(for non-native speakers), wherein Item 8 translates to “awk-
ward”. The forward translators are five native Chinese speak-
ers, all of whom have passed TEM-8 (the highest-level English
language test given in mainland China). Each person translated
the questionnaire independently, then discussed and optimized
their work with Chinese language Ph.D. holders. If the major-
ity of the translators provided the same results, their wording
was used directly. If there was a substantial difference, the two
majority translations were retained (Table 1).
After confirmation by Chinese language experts, we found
no difference between Item 8 in the original versus two
published Chinese versions (Sauro & Lewis, 2014; Sheu
et al., 2017). The verbs 想(think) and 认为(believe) share
INTERNATIONAL JOURNAL OF HUMAN–COMPUTER INTERACTION 3
similar meaning in Chinese, and the translation of certain
items with different word orders are the same –for example,
“I think most people can use the system quickly by learning”
and “I think most users can learn to use the system quickly”
have the same meaning.
We found differences among the forward translations for
Items 5, 6, and 8. On Item 6, three Chinese translators
suggested “inconsistency”while two suggested “前后矛盾”
(“contradiction”) and “不稳定”(“instability”). The published
version uses “inconsistency”. On Item 5, unlike the published
version, all translators suggested “集成”rather than “整合”.
Their back translations were basically the same, but in
Chinese, the “集成”expression is more formal (and accurate)
while the “整合”expression is more colloquial.
Item8provedverytrickytotranslate.Threeversionsuse
the phrase “别扭”(“awkward”)andtwouse“困难”(“very
difficult”). Another published version uses “麻烦”(“trouble”)
and yet another “troublesome”.T
hisconfirmsaprevious
assertion (Bangor et al., 2008;Finstad,2006) that there are
notable differences between non-native English speakers for
this item translation. Three of the translators suggested the
word “别扭”,whichmeans“strange”or “obstructed”in
Chinese.
3.2. Back translation
Most of our translators suggested that Item 6 retain the phras-
ing“不一致”in the back-translated version and that Item 5 use
“集成”. Item 8 was translated with two words, “别扭”and “麻
烦”, so we provided both back translations to another eight
independent translators (also holding TEM-8 certification)
whom we divided into two groups. The two groups returned
basically the same results, again with the exception of Item 8. In
the first group, all four translators suggested “awkward”.In
the second group, none used the word “awkward”at all. This
marks an important departure from previous studies on the
Chinese SUS translation, as this Chinese word has never
appeared in any published version. The preliminary translation
we established after this process is shown in Table 2.
3.3. Structured interviews
The forward translation reported here was obtained by trans-
lation experts. It remains unclear whether Chinese native
speakers can understand the meaning of the items or whether
there may be ambiguity in the translation. We conducted
preliminary testing similarly to Finstad (2006) and Sharfina
and Santoso (2017), but in a slightly different two-part
process. The first part was a structured interview regarding
the translated results wherein a moderator asked the partici-
pants to freely report their level of understanding of each
item, including any ambiguous items. In the second part, we
modified the questionnaire according to the opinions pro-
vided by the participants in the first part. We listed several
options for any ambiguous item and asked the participants to
select the one they felt was most appropriate.
Participants: We selected 31 participants including profes-
sors and graduate students of the School of Industrial Design
and from local IT companies. The participants had completed
multiple perceived usability assessments prior to this study in
UMUX, USE, and SUS (directly translated Chinese version)
formats. All were aged 20–46 years at the time of the study
and all are native Chinese speakers.
Research methods and measures: The purpose of the
survey was to determine how well native Chinese speakers
understand the preliminary SUS translation described above.
Participants were required to report whether they understood
each item and to explain any items they did not understand.
Procedures: Moderators distributed the preliminary
Chinese SUS translation to the participants, explaining
that it is a newly translated questionnaire to evaluate per-
ceived usability for native Chinese speakers. The moderator
asked each participant to read each item aloud before he or
she provided a response. Responses included understand-
ing, incomprehension, or ambiguity. Participant responses
were recorded when they contained any self-reported
misunderstanding.
Somewhat surprisingly, almost all the participants stated that
they did not understand or had doubts regarding the translation
of Item 6. About half of the participants stated that Item 9
contained ambiguity. The results are shown in Table 3.
Chinese language experts suggested that the participants did
not understand the directivity of the phrase “inconsistency”.
On Item 9, the process of building “confident”was unclear to
the participants; the original English SUS version does not
explain this phrasing. The moderator asked the participants to
report further opinions on ambiguous items including the
number of phrases (i.e., content) that they found problematic.
The frequency of problem-content occurrence on Items 6 and 9
is shown in Table 4.
3.4. Modification of preliminary translation according to
interview results
We considered the correlation between SUS items (Bangor
et al., 2008) as discussed in earlier studies (Sauro & Lewis,
2011). Based on a large sample, Lewis and Sauro (2017b)
found that SUS is most suitable as a single dimension.
Although the factor loadings of Items 4 and 10 are larger
than the factor loads of other items, so far there is no evidence
that they belong to a second factor. Since there are no esthetic
Table 2. Preliminary Chinese SUS translation.
1我想我会经常使用这个系统。
2我发现这个系统过于复杂。
3我认为这个系统使用起来很容易。
4我认为我需要技术人员的帮助才能使用这个系统。
5我发现这个系统很好地集成了各种功能。
6我认为这个系统中存在大量的不一致。
7我想大多数用户能很快学会使用该系统。
8我发现这个系统使用起来很别扭。
9我使用这个系统时,感到很有信心
10 在使用这个系统之前,我需要学习很多知识。
Table 3. Participant-reported frequency statistics (items not understood or
ambiguous).
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9 Item 10
0 00012601150
4Y. WANG ET AL.
or visual factors encapsulated in all items, we abandoned any
“inconsistencies”in esthetics and interface style in the pre-
sentation of Item 6 accordingly. According to the statistical
results, we revised Items 6 and 9, as listed in Table 5.
3.5. Back-interview
Twenty-seven of our 31 participants followed up within
one day. They were first asked to read all the alternative
options and compare them with the preliminary translation
results, then to report if they found any ambiguity. The
participants then reported the expression they thought
would best replace Items 6 and 9. Twenty-five of the partici-
pants reported no ambiguity, while the other two reported
misunderstandings; these were attributable to above-average
reading speed, so their results were eliminated after a simple
explanation. The frequency results are shown in Table 6.
We built the options with the highest selection fre-
quency into the final Chinese SUS translation. This final
version is shown alongside the original English version in
Table 7.
4. Experiment I
4.1. Participants
We recruited 217 native Chinese-speaking participants, 116
male and 101 female, via crowdsourcing platform. The parti-
cipants ranged in age from 19 to 42 at the time of their
participation. We added educational background restrictions
(Bachelor’s degree or above) to ensure a consistent base-level
understanding of Chinese.
4.2. Experimental design
We used assessment methods provided by Kortum and
Bangor (2013) and Blažica and Lewis (2015). We also used
a daily product for online usability testing.
4.3. Method
The questionnaire was completed online. Once the participant
was recruited, he or she scanned a QR code in WeChat and
was directed to the test page. The participants then filled in
the questionnaire online according to the given instructions.
4.4. Materials
The participants evaluated the usability of the Jing Dong (JD)
mobile phone application (app). The mobile JD app, a shopping
platform widely used in China, had a monthly active volume of
about 500 million in 2019. In mainland China, the Taobao
shopping app has the most users of any similar application at
1.01 billion. We selected the JD app as the evaluation object to
prevent ceiling effect (e.g., where the participants may be very
Table 4. Frequency of problem-content occurrence and examples of item 6 and item 9.
Participant’s opinion Frequency Example
Item 6
Visual expression of interface 8 …I think what I see may not be what I want to see …
Esthetic feeling 10 The interface is not nice. For example, I think the button is this function, but actually it is …function
Function/Expected functions 19 Maybe my expected function is different from the actual result …
…it function doesn’t seem to work the way I thought it would ….
Expected results 21 I think it’s the result I want and …is different.
What I expected was different from what I actually saw …
Operation/Operating results 20 …When I clicked on …I saw something …
I think I have seen the results of the operation is not I wanted …
Item 9
Operation results 6 I think it’s the satisfaction of the operation that creates confidence
Master the system/use it skillfully 13 I think it’s the satisfaction of the operation that gives confidence …
…Only when I can use it skillfully can I have confidence …
…I feel confident in my ability to master it …
Achieve a specific goal 14 …Maybe I have completed a task before I feel confident …
Proficiency 17 I felt confident after I became very proficient with the system ….
Table 5. Participant-informed alternative options for item 6 and item 9.
Item 6
1 I think there’s a lot of inconsistency in operates and functions.
2 I think there are a lot of inconsistencies in the various functions of the system
3 I think there are a lot of inconsistencies in the operation behavior and
functions of this system
4 In using the system, I found that many of the operating results were
inconsistent with the expected functions.
5 When I use this system, the result of operation is inconsistent with the
function of the system
6 When I used this system, results of many operations produced inconsistent
functions
Item 9
1 When I use this system, I feel confident about the results.
2 I felt very confident that I can master the system.
3 When I use this system, I feel confident that I can achieve every goal
Table 6. Frequency of alternative options for items 6 and 9.
No. 1 No. 2 No. 3 No. 4 No. 5 No. 6
Item 6
2 13135 3
Item 9
7 15 5 NA NA NA
Table 7. Final Chinese SUS version.
1我愿意经常使用这个系统.
2我发现这个系统太复杂.
3我认为这个系统使用起来很容易.
4我认为我需要技术人员的帮助才能使用这个系统.
5我发现这个系统很好地集成了各种功能.
6在使用该系统的过程中,我发现很多操作结果和预想功能不一致.
7我想大多数用户能很快学会使用该系统.
8我发现这个系统使用起来很别扭.
9我对熟练掌握这个系统很有信心.
10 在使用这个系统之前,我需要学习很多知识.
INTERNATIONAL JOURNAL OF HUMAN–COMPUTER INTERACTION 5
familiar with mobile Taobao). The homepage of the JD app is
shown in Figure 1.
We added a survey item to the questionnaire, the usage
frequency of the JD app (general statistics expressed as monthly
rate of activity), to prevent interference with the first item. We
designed two options and added explanations: 1) almost no use
or little use over the past two months and 2) frequent use over
nearly one month or multiple uses daily. Items were scored on
a five-point scale. The statement “This test is not assessment
your own abilities, but user feedback”was displayed prior to
testing to assure participants that they were not think their
personal capabilities were under test. We replaced the phrase
“system”with “JD APP”on the questionnaire as well.
We added a UMUX-LITE questionnaire (Lewis, Utesch, &
Maher, 2013) after the SUS was completed to determine the
concurrent validity of our translation. According to previous
researchers (Borsci et al., 2015), the correlation between
UMUX-LITE and SUS is as high as 0.81. UMUX-LITE is
brief and easy to translate. The translated version (by expert)
is almost identical to the literal translation provided by Sauro
and Lewis (2014). Although we did not report the translation
process, after pretesting, no participant reported any ques-
tions about any item.
4.5. Results
Basic statistics and normative comparison:T
hemeanvalue
of all 217 participants was 72.77 (n = 217, SD = 11.82), which
is in line with results reported by Sauro and Lewis (2011).
AccordingtotheCGS(Sauro&Lewis,2014), the corre-
sponding level is B- (72.6–74) and the 95% confidence inter-
val is C+ (71.19) to B (74.35). Our result is also consistent
with previous work by Kortum and Sorber (2015). It falls
below the average for popular apps, 76.1, but is in line with
our predictions; this is because the JD app is not the most
popular Chinese online shopping platform.
The UMUX-LITE mean value of all 217 participants was
72.45 (n = 217, SD = 10.18), 95% CI (71.00, 73.82).
Infrequently using participants scored n (53) = 63.27,
SD = 11.47, 95% CI (60.20, 66.23) and frequently using
participants scored n (164) = 75.42, SD = 7.69, 95% CI
(74.20, 76.51).
Sensitivity: The Chinese SUS version is sensitive to differ-
ences in frequency of use. This also means that participants
with different usage frequency rated the app differently in our
case. The score of infrequently using participants is 63.87
(n = 53, SD = 13.33), 95% CI (60.19, 67.54) and that of
frequently using participants is 75.64 (n = 164, SD = 9.71),
95% CI (74.14, 77.14) The Two-sample independent group
t-test shown the difference between the two is statistically
significant (t (215) = 6.965, p< .01).
Reliability: The Cronbach alpha reliability we obtained is
0.84, 95% CI (0.807, 0.870), which is lower than that of the
original English version (0.92) but does exceed than the lower
threshold of 0.7. This result is basically in line with other
translated versions (Alghannam et al., 2018; Blažica & Lewis,
2015; Borsci, Federici, Mele, & Conti, 2015).
Concurrent validity: The overall correlation between UMUX-
LITE and SUS score was highly significant (r(215) = 0.807,
p< .0001) and was significantly greater than Nunnally’s(1978)
minimum criterion of 0.3 (95% CI from 0.755 to 0.848), which is
consistent with results reported by Lewis, Utesch, and Maher
(2015). For infrequently using participants, the correlation is
0.743; for frequently using participants, the correlation is 0.745,
whichismuchhigherthanthe general 0.3 (Nunnally, 1978).
Construct validity: Our KMO is 0.879, much higher than 0.7,
indicating that the Chinese SUS version we established has good
construct validity. Although the Varimax-rotated two-factor
result for the Chinese version (Table 8) is different from the
original version’s factor analysis, recent studies have shown that
the original SUS factor may be only one (Lewis & Sauro, 2017b).
Figure 1. JD app homepage.
6Y. WANG ET AL.
5. Experiment II
For effective comparison against the literal translation from
Sauro and Lewis (2014) of SUS, we designed Experiment II
and re-tested the three most varied items, namely Item 6, Item
8, and Item 9. The reason why we did not choose the overall
questionnaire for retesting was because the other items of the
entire questionnaire did not significantly change. If measured
simultaneously, it is difficult to reduce the interference from
retest effect.
5.1. Participants
We recruited 151 native Chinese-speaking participants, 83
male and 68 female undergraduate and graduate students, to
conduct remote testing through recruitment or crowdsour-
cing platforms. For those who used the Internet survey, we
maintained educational background restrictions (Bachelor’s
degree or above) to ensure a consistent base-level understand-
ing of Chinese. We did not arrange for the recruited partici-
pants to test the JD app, but since the crowdsourcing platform
is anonymous, it was not clear whether any of the participants
had participated in Experiment I.
5.2. Experimental design
The experiment was conducted with a within-group design.
One participant completed the two parts of the questionnaire
at the same time. In keeping with Experiment I, we continued
to utilize a retrospective assessment of the JD app.
5.3. Method
In line with Experiment I, the questionnaire was completed
online. Once the participant was recruited, he or she scanned
a QR code in WeChat and was directed to the test page. The
participants then filled in the questionnaire online according
to the given instructions.
5.4. Materials
The evaluation material for the experiment was still the JD
app in this case, but the test materials are different from
Experiment I. The test material was divided into two parts.
The first part is the literal translation of SUS, which comes
from “Measuring the user experience”(Chinese version)
(Sauro & Lewis, 2014). Except for replacing the word “system”
with “JD APP”, we did not change the expression of the item.
The second part has three items. These three questions come
from Item 6, Item 8, and Item 9 in the final version of the
questionnaire (Table 7). The purpose of this design was to
compare changes in participants’evaluations before and after
cultural adaptation in the same group.
Since the experiment was conducted with a within-group
design, it was inevitably affected by retest effect. There are
three sources of retest effect (Arendasy & Sommer, 2017); on
a short questionnaire, it is likely a memory of the results of
the questionnaire. Retest effect can be minimized by using
alternative forms and increasing the time interval of the
survey (Arendasy & Sommer, 2017). Compared with the ori-
ginal item, our modified items were identical but there are
certain differences in the description; they meet the require-
ments of alternative forms. In order to increase the time
interval, after the subject completed the items of the literal
SUS translation, we inserted an intelligence test item as
a distraction. The second part of the questionnaire began
after this test item was completed.
5.5. Results
5.5.1. Basic statistics
The total score of the literal version is 70.54 (n = 151,
SD = 11.07), 95% CI (68.77, 72.33) which translates to the
CGS level as C (65–71). Compared to the results of
Experiment I, the CGS difference spans two grades. The
score of the literal version is lower, but the difference in scores
is not significant (t (366) = −1.818, p= .07, p> .05). The
results of the basic statistics and the paired sample t-test
between the three items from the literal version and those of
our final version (Table 7) are shown in Table 9.
5.5.2. Reliability
The literal version has Cronbach alpha of 0.785, 95% CI
(0.730, 0.833). This is greater than the minimum acceptable
value of 0.7, and is not very different from other published
translations which have reliabilities that range from 0.79 to
0.84 (Lewis, 2018b)
5.5.3. Degree of correlation
The correlations between Items 6, 8, and 9 and the total score
on the literal version are 0.604, (95% CI from 0.456 to 0.713),
Table 8. Varimax-rotated two-factor solution, Chinese SUS version.
Component
Factor 1 Factor 2
Item 1 .643 −.047
Item 2 .676 .384
Item 3 .715 .410
Item 4 .195 .746
Item 5 .761 −.089
Item 6 .646 .311
Item 7 .551 .449
Item 8 .605 .491
Item 9 .415 .493
Item 10 −.111 .807
Table 9. Comparison of three items: literal Chinese SUS versus final version (cross-cultural adaptation).
Literal translation version (SD) Final version (SD). t (df) Sig.(2-tailed)
Item 6 6.11 (2.01) 6.27 (2.04) −0.98 (150) 0.328
Item 8 7.00 (2.21) 7.04 (2.33) −0.22 (150) 0.826
Item 9 7.25 (1.57) 7.61 (1.64) −2.86 (150) 0.005
INTERNATIONAL JOURNAL OF HUMAN–COMPUTER INTERACTION 7
0.756 (95% CI from 0.678 to 0.817), and 0.483 (95% CI from
0.312 to 0.615) respectively. In Experiment I, these correla-
tions are 0.683 (95% CI from 0.593 to 0.757), 0.760 (95%CI
from 0.682 to 0.815), and 0.633 (95% CI from 0.514 to 0.734),
respectively.
On Item 6, the correlation between the literal version and
the final version is 0.475, 95% CI (0.300, 0.635); the correla-
tion of Item 8 is 0.671, 95% CI (0.525, 0.778) and that of Item
9 is 0.527, 95% CI (0.384, 0.658). The internal correlation of
these three item pairs is consistent with the large sample
inter-item correlations published for other SUS versions
(Bangor et al., 2008; Lewis, 2018b).
6. Discussion
To the best of our knowledge, most user evaluation practi-
tioners in China currently utilize the literal translation of the
SUS or another version established without user feedback.
Given the large number of native Chinese users, our goal in
conducting this study was to establish an unambiguous ver-
sion of SUS that is understandable for native Chinese speak-
ers. We re-translated the Chinese version of SUS accordingly.
We translated Item 8 and Item 5 in a manner unlike any
previously published version, and we added a structured
interview after forward translation. This process is a cross-
cultural adaptation. Participants in the structured interviews
evaluated the translation results and reported any misunder-
standing of the expert translations, proffering alternative sug-
gestions. Structured interviews revealed different opinions
among our participants on Items 6 and 9, so we modified
the preliminary translation results according to the partici-
pants’statements and SUS dimensions. The final Chinese SUS
version we secured has good reliability and validity. It is also
sensitive regardless of variations in usage frequency and has
high concurrent validity.
In contrast to the literally translated version of SUS, we set
up a comparative study in Experiment II. First, we ran
a comparison between the literal translation version and the
cross-cultural adaptation (the final version we established).
Data analysis shows that the reliability of the cross-cultural
adaptation is 0.84 (95% CI from 0.807 to 0.870), while the
reliability of the literal version is 0.785 (95% CI from 0.730 to
0.833). In effect, the reliability of the literal version is roughly
equal to the reliability of other translated versions (0.79–0.84),
but appears to be somewhat less reliable than the original
SUS, which usually exceeds 0.90. The cross-cultural adapta-
tion version had a reliability of 0.84. There is some overlap in
the 95% confidence interval by comparison with the literal
version (0.785). This suggests that although the reliability of
the cross-cultural adaptation version has been improved, the
difference between the two versions is not significant. This is
not surprising, as we altered only a few items rather than
redesigning the entire SUS. Of course, more samples and
verification of different kinds of systems are yet needed to
support this assertion.
In Experiment II, we directly compared the direct changes
of the three items. The mean and standard deviation of the
three items both increased. Although the difference is not
significant, we did find changes caused by the cross-cultural
adaptation. We also found that the correlations between three
items and the total score in the cross-cultural adaptation
version do differ compared with the same three items of the
literal version. On Experiment II, the correlations between the
three items and the total score are 0.604, 0.756, and 0.483,
respectively. On Experiment I, the correlations between the
three items processed by cultural adaptation and the total
score are 0.683, 0.760, and 0.633. The difference between
Item 9 was significant with no overlap in the 95% CI interval;
the differences between the other two items were not
significant.
The difference in total score between the literal version and
cross-cultural adaptation version is not significant, but the
same difference on the CGS scale spans two grade levels.
Cross-cultural adaptation does not serve to rewrite or under-
mine the previous translation, but to allow the participants to
understand it more easily. Assuming that the participants
understand the meaning of the items accurately, then if they
think the test material is easier to use, the SUS total score may
increase.
There is a possibility that the existence of social desirability
response bias, the tendency to respond in a way that avoids
criticism (Arnold & Feldman, 1981), as well as humility and
moderation in Chinese culture may have driven participants
to give milder answers to ambiguous questions. If there is no
strong emotional stimulation, participants tend to give
responses near to the zero or baseline (Verduyn & Lavrijsen,
2015). If the subject does not know the specific meaning of an
item, he or she will provide a conservative assessment to
subconsciously avoid making mistakes; thus, these assessment
results consistently approach an intermediate evaluation.
There is also a possibility that for an item that is ambig-
uous in a literal translation, the subject may have speculated
regarding the item’s meaning and responded based on the
speculation. In the structural interviews, we also found that
although the participants expressed ambiguity regarding some
of the items in the literal translation, except for Item 6 (a few
participants’guesses deviated from the SUS factor), each sub-
ject’s own guess or subjective interpretation is basically con-
sistent with the single SUS factor. Even if an item seemed
ambiguous on the direct translated version, the participant
could guess the general meaning of the item. Therefore, even
if such participants use the literally translated version of SUS,
their scores are not much different from the SUS scores that
have been cross-culturally adapted.
By comparing the average score and standard deviation of
the three items, we found that they all slightly increased; this
indicates that a more accurate expression led to a more accu-
rate participant evaluation. The scores of the experimental
materials were generally above 70 points, which reveals
a certain tendency compared with the intermediate state (50
points, all items, where participants selected the median
value). If the subject did not tend to give an overly positive
answer to an ambiguous question, the standard deviation of
the data decreased. If the presentation of the questionnaire is
very accurate, then the participants can confidently make
a biased judgment. This leads to an increase in data dispersion
and a higher score than the medium score. As a result, the
scores of the three items on our final version varied to greater
8Y. WANG ET AL.
extent than the scores of the corresponding items in the literal
version.
The paired sample t-test shows that Item 9 has significant
differences in between Experiment I and Experiment II, which
may be due to the role of social approval bias on this item.
This item seems to reflect the ability or confidence of the
participants to master the system. When expressed accurately,
the participants tend to report their achievements more posi-
tively resulting in an increase in the score. This may be
illustrated in the increase in the correlation between Item 9
and the total score.
We did not readily observe significant differences in the
two groups between the scores of the three items in the literal
translation and culturally adapted translation. There are two
possible reasons for this. First, it is possible that the inherent
usability of our experimental materials is not very high. The
SUS score is only 72. Even if the participants very confidently
make extreme judgments, the difference between the total
score and the intermediate score is relatively small. Second,
participants perform subjective analysis as they fill out ques-
tionnaires. Because speculations or subjective understanding
of the participants is roughly the same as the SUS factor, the
total score remains largely unchanged.
In general, it is unclear how precisely each subject treats
items that are ambiguous or unclear. In the structured inter-
views, because our participants were skilled users, they could
infer the meaning of ambiguous items relatively accurately. If
they are new users, or users who have never used a similar
questionnaire, their speculation may be less accurate.
Therefore, cross-cultural adaptation is necessary to minimize
speculation and improve the accuracy of the measurements.
7. Limitations and future work
We recruited relatively few participants for the purposes of
this study. In order to visually compare the changes before
and after cross-cultural adaptation, we adopted a within-
group design in Experiment II. This limitation was necessary
to avoid retest effect. We could not let the participants com-
plete two questionnaires, the literal translation and final ver-
sion, within in a short period of time. We also separated three
items, 6, 8, and 9, in Experiment II to minimize retest effect
and in doing so may have affected the results. We know that if
the sample size was larger, we could more accurately compare
the changes in SUS scores after cross-cultural adaptation.
A between-group design tests would further enhance the
results. In short, more research is yet needed.
We plan to encourage more Chinese users to adopt the
cross-cultural adaptation SUS to measure the usability of
products and systems. We will build a large sample database
of usability in the Chinese environment so that we can con-
tinue to study the factor structure of Chinese SUS versions. In
addition, we plan to study whether the number of items can
be further reduced on the Chinese SUS. In the English ver-
sion, it has been found that reducing one item does not affect
the reliability of the survey overall (Lewis & Sauro, 2017a). We
also will report translations of positive SUS versions and other
usability scales such as UMUX and UMUX-LITE in Chinese.
Though we used a literal translation in this study, such literal
versions require large samples to verify their reliability.
Other researchers (Lewis, 2018a; Lewis et al., 2015) have
reported a high correlation between UMUX-LITE and SUS.
Future work should include reporting correlations between
the cross-cultural adaptation SUS and other Chinese versions
of scales such as UMUX-LITE, and UMUX.
References
Alghannam,B.A.,Albustan,S.A.,Al-Hassan,A.A.,&
Albustan,L.A.(2018). Towards a standard arabic system usability
scale: Psychometric evaluation using communication disorder app.
International Journal of Human–Computer Interaction,34(9),
799–804. doi:10.1080/10447318.2017.1388099
Arendasy, M. E., & Sommer, M. (2017). Reducing the effect size of the
retest effect: Examining different approaches. Intelligence,62,89–98.
doi:10.1016/j.intell.2017.03.003
Arnold, H. J., & Feldman, D. C. (1981). Social desirability response bias
in self-report choice situations. The Academy of Management Journal,
24(2), 377–385.
Assila, A., De Oliveira, K. M., & Ezzedine, H. (2016). Standardized
usability questionnaires: Features and quality focus. Electronic
Journal of Computer Science & Information Technology,6(1), 15–31.
Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An empirical
evaluation of the System Usability Scale. International Journal of
Human–Computer Interaction,24,574–594. doi:10.1080/
10447310802205776
Blažica, B., & Lewis, J. R. (2015). A slovene translation of the system
usability scale: The sus-si. International Journal of Human-Computer
Interaction,31(2), 112–117. doi:10.1080/10447318.2014.986634
Borkowska, A., & Jach, K. (2017). Pre-testing of polish translation of
System Usability Scale (SUS). In Information systems architecture and
technology: Proceedings of 37th international conference on information
systems architecture and technology –ISAT 2016, Karpacz, Poland –
Part I (pp. 143–153). Springer International.
Borsci, S., Federici, S., Bacci, S., Gnaldi, M., & Bartolucci, F. (2015).
Assessing user satisfaction in the era of user experience: Comparison
of the sus, umux, and umux-lite as a function of product experience.
International Journal of Human-Computer Interaction,31(8),
484–495. doi:10.1080/10447318.2015.1064648
Borsci, S., Federici, S., & Lauriola, M. (2009). On the dimensionality of
the system usability scale: A test of alternative measurement models.
Cogn Process,10(3), 193–197. doi:10.1007/s10339-009-0268-9
Borsci, S., Federici, S., Mele, M. L., & Conti, M. (2015). Short scales of
satisfaction assessment: A proxy to involve disabled users in the usabil-
ity testing of websites. Human-Computer interaction: Users and con-
texts (pp. 35–42). Cham: Springer International Publishing.
Brooke, J. (1996). SUS: A “quick and dirty”usability scale. In P. Jordan,
B. Thomas, & B. Weerdmeester (Eds.), Usability evaluation in industry
(pp. 189–194). London, UK: Taylor & Francis.
Brooke, J. (2013). SUS: A retrospective. Usability Professionals’
Association,8(2), 29–40.
Cao, S., Cao, J., Li, S., Wang, W., Qian, Q., & Ding, Y. (2018). Cross-
cultural adaptation and validation of the simplified chinese version of
copenhagen hip and groin outcome score (hagos) for total hip
arthroplasty. Journal of Orthopaedic Surgery and Research,13(1),
278. doi:10.1186/s13018-018-0971-2
Cardoso, M. H., & Capellini, S. A. (2018). Translation and cross-cultural
adaptation of the detailed assessment of speed of handwriting 17+ to
brazilian portuguese: Conceptual, item and semantic equivalence.
Codas,30(1), e20170041. doi:10.1590/2317-1782/20182017041
Chen, R., Hao, Y., Feng, L., Zhang, Y., & Huang, Z. (2011). The Chinese
version of the pediatric quality of life inventory™(pedsql™) family
impact module: Cross-cultural adaptation and psychometric evalua-
tion. Health & Quality of Life Outcomes,9(1), 16. doi:10.1186/1477-
7525-9-16
INTERNATIONAL JOURNAL OF HUMAN–COMPUTER INTERACTION 9
Dianat, I., Ghanbari, Z., & Asgharijafarabadi, M. (2014). Psychometric
properties of the persian language version of the system usability scale.
Health Promotion Perspectives,4(1), 82–89. doi:10.5681/hpp.2014.011
Erdinç, O., & Lewis, J. R. (2013). Psychometric evaluation of the t-csuq:
The turkish version of the computer system usability questionnaire.
International Journal of Human-Computer Interaction,29(5),
319–326. doi:10.1080/10447318.2012.711702
Finstad, K. (2006). The system usability scale and non-native english
speakers. Journal of Usability Studies,1(4), 185–188.
Flavián, C., Guinalíu, M., & Gurrea, R. (2006). The role played by
perceived usability, satisfaction and consumer trust on website
loyalty. Information and Management,43(1), 1–14. doi:10.1016/j.
im.2005.01.002
Gao, Q., Zhu, B., Rau, P. L. P., Vyas, S., Chen, C., & Li, H. (2013). User
experience with chinese handwriting input on touch-screen mobile
phones. Cross-Cultural design. Methods, practice, and case studies
(pp. 384–392). Berlin, Heidelberg: Springer.
Herman, L. (1996). Towards effective usability evaluation in Asia:
Cross-Cultural differences. In Proceedings of the 6th Australian
Conference on Computer–Human Interaction (OZCHI'96), Sydney,
Australia (pp. 135–136). Washington, DC: IEEE Computer
Society.
Hilton, A., & Skrutkowski, M. (2002). Translating instruments into other
languages: Development and testing processes. Cancer Nursing,25(1),
1–7. doi:10.1097/00002820-200202000-00001
Jia, G., Zhou, J., Yang, P., Lin, C., Cao, X., Hu, H., Ning, G. (2013).
Integration of user centered design in the development of health mon-
itoring system for elderly. 35th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society (EMBC).(pp.
1748–1751). Osaka, Japan: IEEE.
Konstantina, O., Nikolaos, T., & Christos, K. (2015). Perceived usability
evaluation of learning management systems: Empirical evaluation of
the system usability scale. The International Review of Research in
Open and Distributed Learning,16(2), 227–246.
Kortum, P., & Sorber, M. (2015). Measuring the usability of mobile applica-
tions for phones and tablets. International Journal of Human-Computer
Interaction,31(8), 518–529. doi:10.1080/10447318.2015.1064658
Kortum, P. T., & Bangor, A. (2013). Usability ratings for everyday
products measured with the system usability scale. International
Journal of Human-Computer Interaction,29(2), 67–76. doi:10.1080/
10447318.2012.681221
Lei, J., Xu, L., Meng, Q., Zhang, J., & Gong, Y. (2014). The current status
of usability studies of information technologies in china: A systematic
study. BioMed Research International,2014, 568303.
Lewis, J. (2006). Usability testing. In: G. Salvendy (Ed.), Handbook of
human factors and ergonomics (pp. 1275–1316). Hoboken, NJ: John
Wiley and Sons.
Lewis, J., Utesch, B., & Maher, D. (2013). UMUX-LITE –When there’sno
time for the SUS. Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems, Paris, France (pp. 2099–2102).
Lewis, J. R. (2018a). Measuring perceived usability: The csuq, sus, and
umux. International Journal of Human–Computer Interaction,34(12),
1148–1156. doi:10.1080/10447318.2017.1418805
Lewis, J. R. (2018b). The system usability scale: Past, present, and future.
International Journal of Human–Computer Interaction,34(7),
577–590. doi:10.1080/10447318.2018.1455307
Lewis, J. R., & Sauro, J. (2017a). Can I leave this one out? The effect of
dropping an item from the sus. Journal of Usability Studies,13(1), 38–46.
Lewis, J. R., & Sauro, J. (2017b). Revisiting the factor structure of system
usability scale. Journal of Usability Studies,12(4), 183–192.
Lewis, J. R., Utesch, B. S., & Maher, D. E. (2015). Investigating the
correspondence between umux-lite and sus scores. International con-
ference of design, user experience, and usability (pp. 204–211). Los
Angeles, CA: Springer.
Li, F., & Li, Y. (2011). Usability evaluation of e-commerce on b2c
websites in china. Procedia Engineering,15, 5299–5304. doi:10.1016/
j.proeng.2011.08.982
Li, R., Chen, Y. V., Sha, C., & Lu, Z. (2017). Effects of interface layout on
the usability of in-vehicle information systems and driving safety.
Displays,49, 124–132. doi:10.1016/j.displa.2017.07.008
Liu, Z. (2014). User experience in asia. Journal of Usability Studies,9(2),
42–50.
Liu, Z., Zhang, J., Zhang, H., & Chen, J. (2011). Usability in China.InI.
Douglas & Z. Liu (Eds.), Global usability (pp. 111–135). London, UK:
Springer.
Martinsa, A. I., Rosa, A. F., Queirós, A., & Silva, A. (2015). European
Portuguese validation of the System Usability Scale. Procedia
Computer Science,67, 293–300. doi:10.1016/j.procs.2015.09.273
Mcgee, M., Rich, A., & Dumas, J. (2004). Understanding the usability
construct: User-perceived usbility. Human Factors & Ergonomics
Society Annual Meeting Proceedings,48(5), 907–911. doi:10.1177/
154193120404800535
Mohamad, M. M., Yaacob, N. A., & Yaacob, N. M. (2018). Translation,
cross-cultural adaptation, and validation of the malay version of the
system usability scale questionnaire for the assessment of mobile apps.
Jmir Human Factors,5(2), e10308. doi:10.2196/10308
Noiwan, J., & Norcio, A. F. (2006). Cultural differences on attention and
perceived usability: Investigating color combinations of animated
graphics. International Journal of Human-Computer Studies,64(2),
103–122. doi:10.1016/j.ijhcs.2005.06.004
Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-
Hill.
Park, J., Han, S. H., Kim, H. K., Cho, Y., & Park, W. (2013). Developing
elements of user experience for mobile phones and services: Survey,
interview, and observation approaches. Human Factors in Ergonomics
& Manufacturing,23(4), 279–293. doi:10.1002/hfm.20316
Pei, L., Xia, J., & Yan, J. (2010). Cross-cultural adaptation, reliability and
validity of the chinese version of the fear avoidance beliefs
questionnaire. Journal of International Medical Research,38(6),
1985–1996. doi:10.1177/147323001003800612
Rajanen, D., Clemmensen, T., Iivari, N., Inal, Y., Rızvanoğlu, K.,
Sivaji, A., …Boison, D. (2017). Ux professionals’definitions of usabil-
ity and ux –A comparison between turkey, finland, denmark, france
and malaysia. INTERACT 2017: Human-Computer Interaction –
INTERACT 2017, Mumbai, India (pp 218–239).
Rosa, A. F., Martins, A. I., Costa, V., Queirós, A., Silva, A., &
Rocha, N. P. (2015). European Portuguese validation of the
Post-Study System Usability Questionnaire (PSSUQ). 2015 10th
Iberian Conference on Information Systems and Technologies
(CISTI) (pp. 17–20). Aveiro, Portugal: IEEE.
Rubin, J., & Chisnell, D. (2008). Handbook of usability testing: How to
plan, design, and conduct effective tests. Hoboken, NJ: Wiley.
Sauro, J. (2010). Does better usability increase customer loyalty? Retrieved
from http://www.measuringusability.com/usability-loyalty.php
Sauro, J., & Lewis, J. R. (2009, April 4–9). Correlations among prototypi-
cal usability metrics: Evidence for the construct of usability.
Proceedings of the 27th International Conference on Human Factors
in Computing Systems, CHI 2009, Boston, MA: ACM.
Sauro, J., & Lewis, J. R. (2011). When designing usability question-
naires, does it hurt to be positive? Proceedings of CHI 2011 (pp.
2215–2223). Vancouver, Canada: Association for Computing
Machinery.
Sauro, J., & Lewis, J. R. (2014). Quantifying the user experience. Beijing,
China: China Machine Press.
Sharfina, Z., & Santoso, H. B. (2017). An Indonesian adaptation of the
System Usability Scale (SUS). International Conference on Advanced
Computer Science & Information Systems (pp. 145–148). Malang:
IEEE.
Shepard,L.A.,Camilli,G.,&Averill,M.(1981). Comparison of
procedures for detecting test-item bias with both internal and
external ability criteria. Journal of Educational Statistics,6(4),
317–375.
Sheu, F., Fu, H., & Shih, M. (2017). Pre-Testing the Chinese version
of the System Usability Scale (C-SUS). Workshop Proceedings of
the 25th International Conference on Computers in Education
(pp. 28–34). New Zealand: Asia-Pacific Society for Computers in
Education.
Tullis, T., & Albert, W. (2008). Measuring the user experience: Collecting,
analyzing, and presenting usability metrics (2nd ed.). Beijing, China:
Publishing House of Electronics Industry.
10 Y. WANG ET AL.
Tullis, T. S., & Stetson, J. N. (2004). Acomparisonofquestionnairesfor
assessing website usability. Paper presented at the Usability Professionals
Association Annual Conference (pp. 7–11). Minneapolis, MN: UPA.
Van de Vijver, A. J. R., & Rothmann, S. (2004). Assessment in multi-
cultural groups: The South African case. SA Journal of Industrial
Psychology,30(4), 1–7. doi:10.4102/sajip.v30i4.169
Van de vijver, F. J. R., & Leung, K. (1997). Methods and data analysis of
comparative research. In J. W. Berry, Y. H. Poortinga, & J. Pandey
(Eds.), Handbook of cross-cultural psychology (Vol. 1, 2nd ed., pp.
257–300). Boston, MA: Allyn & Bacon.
Van de vijver, F. J. R., & Poortinga, Y. H. (1997). Towards an integrated
analysis of bias in cross-cultural assessment. European Journal of
Psychological Assessment,13,29–37. doi:10.1027/1015-5759.13.1.29
Verduyn, P., & Lavrijsen, S. (2015). Which emotions last longest and
why: The role of event importance and rumination. Motivation and
Emotion,39(1), 119–127. doi:10.1007/s11031-014-9445-y
Vermeeren, A. P. O. S., Law, L. C., Roto, V., Obrist, M., Hoonhout, J., &
VäänänenVainioMattila, K. (2010). User experience evaluation methods:
Current state and development needs.Nordic conference on human-computer
interaction (pp. 521–530), Reykjavik, Iceland. New York, NY: ACM.
Vijver, F. V. D., & Tanzer, N. K. (2004). Bias and equivalence in
cross-cultural assessment: An overview. Revue Européenne De
Psychologie Appliquée/european Review of Applied Psychology,54(2),
119–135. doi:10.1016/j.erap.2003.12.004
Wang, J. (2003). Human-computer interaction research and practice
in china. Interactions,10(2), 88–96. doi:10.1145/637848
Wang, Y., & Lv, F. (2017). Usability testing of ATM machines interface
based on eye tracking data. Chinese Journal of Ergonomics,23(1),
48–54.
Yan, Y., Wang, G., Liu, S., & Wu, H. (2012). Usability evaluation of
infusion pump based on system usability scale. China Medical Devices,
27(10), 25–27.
Yang, T., Linder, J., & Bolchini, D. (2012). Deep: Design-oriented
evaluation of perceived usability. International Journal of Human-
Computer Interaction,28(5), 308–346. doi:10.1080/10447318.
2011.586320
About the Authors
Yuhui Wang is a researcher of the Industrial Design Department,
Huazhong University of Science and Technology. His research interests
HCI design, usability assessment and product design.
Tian Lei is an associate professor at the School of Mechanical Science
and Engineering, Huazhong University of Science and Technology,
where he is Secretary. His research covers information visualization in
medicine and engineering, usability, and mobile HCI.
Xinxiong Liu is a Professor of Industrial Design Department, Huazhong
University of Science and Technology. His research interests usability
and product design.
INTERNATIONAL JOURNAL OF HUMAN–COMPUTER INTERACTION 11