ArticlePDF Available

A Comparison of Questionnaires for Assessing Website Usability

Authors:

Abstract and Figures

Five questionnaires for assessing the usability of a website were compared in a study with 123 participants. The questionnaires studied were SUS, QUIS, CSUQ, a variant of Microsoft's Product Reaction Cards, and one that we have used in our Usability Lab for several years. Each participant performed two tasks on each of two websites: finance.yahoo.com and kiplinger.com. All five questionnaires revealed that one site was significantly preferred over the other. The data were analyzed to determine what the results would have been at different sample sizes from 6 to 14. At a sample size of 6, only 30-40% of the samples would have identified that one of the sites was significantly preferred. Most of the data reach an apparent asymptote at a sample size of 12, where two of the questionnaires (SUS and CSUQ) yielded the same conclusion as the full dataset at least 90% of the time.
Content may be subject to copyright.
UPA 2004 Presentation—Page 1
A Comparison of Questionnaires for Assessing Website Usability
Thomas S. Tullis and Jacqueline N. Stetson
Human Interface Design Department, Fidelity Center for Applied Technology
Fidelity Investments
82 Devonshire St., V4A
Boston, MA 02109
Contact: tom.tullis@fidelity.com
ABSTRACT:
Five questionnaires for assessing the usability of a website were compared in a study with
123 participants. The questionnaires studied were SUS, QUIS, CSUQ, a variant of
Microsoft’s Product Reaction Cards, and one that we have used in our Usability Lab for
several years. Each participant performed two tasks on each of two websites:
finance.yahoo.com and kiplinger.com. All five questionnaires revealed that one site was
significantly preferred over the other. The data were analyzed to determine what the
results would have been at different sample sizes from 6 to 14. At a sample size of 6, only
30-40% of the samples would have identified that one of the sites was significantly
preferred. Most of the data reach an apparent asymptote at a sample size of 12, where two
of the questionnaires (SUS and CSUQ) yielded the same conclusion as the full dataset at
least 90% of the time.
Introduction
A variety of questionnaires have been used and reported in the literature for assessing the
perceived usability of interactive systems, including QUIS [3], SUS [2], CSUQ [4], and
Microsoft’s Product Reaction Cards [1]. (See [5] for an overview.) In our Usability Lab, we
have been using our own questionnaire for the past several years for assessing subjective
reactions that participants in a usability test had to a web site. However, we had concerns
about the reliability of our questionnaire (and others) given the relatively small number of
participants in most typical usability tests. Consequently, we decided to conduct a study to
determine the effectiveness of some of the standard questionnaires, plus our own, at
various sample sizes. Our focus was specifically on websites.
Method
We decided to limit ourselves to our own questionnaire plus those in the published literature
that we believed could be adapted to evaluating websites. The questionnaires we used were
as follows (illustrated in Appendix A):
1. SUS (System Usability Scale)—This questionnaire, developed at Digital Equipment
Corp., consists of ten questions. It was adapted by replacing the word “system” in
every question with “website”. Each question is a statement and a rating on a five-
point scale of “Strongly Disagree” to “Strongly Agree”.
2. QUIS (Questionnaire for User Interface Satisfaction)—The original questionnaire,
developed at the University of Maryland, was composed of 27 questions. We
dropped three that did not seem to be appropriate to websites (e.g., “Remembering
names and use of commands”). The term “system” was replaced by “website”, and
the term “screen” was generally replaced by “web page”. Each question is a rating
on a ten-point scale with appropriate anchors at each end (e.g., “Overall Reaction to
the Website: Terrible … Wonderful”).
3. CSUQ (Computer System Usability Questionnaire)—This questionnaire, developed at
IBM, is composed of 19 questions. The term “system” or “computer system” was
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 2
replaced by “website”. Each question is a statement and a rating on a seven-point
scale of “Strongly Disagree” to “Strongly Agree”.
4. Words (adapted from Microsoft’s Product Reaction Cards)—This questionnaire is
based on the 118 words used by Microsoft on their Product Reaction Cards [1]. (We
are grateful to Joey Benedek and Trish Miner of Microsoft for providing the complete
list.) Each word was presented with a check-box and the user was asked to choose
the words that best describe their interaction with the website. They were free to
choose as many or as few words as they wished.
5. Our Questionnaire—This is one that we have been using for several years in usability
tests of websites. It is composed of nine statements (e.g., “This website is visually
appealing”) to which the user responds on a seven-point scale from “Strongly
Disagree” to “Strongly Agree”. The points of the scale are numbered -3, -2, -1, 0, 1,
2, 3. Thus, there is an obvious neutral point at 0.
Note that other tools designed as commercial services for evaluating website usability (e.g.,
WAMMI [6], RelevantView [7], NetRaker [8], Vividence [9]) were not included in this study.
Some of these tools use their own proprietary questionnaires and some allow for the
construction of your own.
The entire study was conducted online via our company’s Intranet. A total of 123 of our
employees participated in the study. Each participant was randomly assigned to one of the
five questionnaire conditions. Each was asked to perform two tasks on each of two well-
known personal financial information sites: finance.Yahoo.com and Kiplinger.com. (In the
rest of this paper they will simply be referred to as Site 1 and Site 2. No relationship
between the site numbers and site names should be assumed.) The two tasks were as
follows:
1. Find the highest price in the past year for a share of <company name>. (Note that a
different company was used in each task.)
2. Find the mutual fund with the highest 3-year return.
The order of presentation of the two sites was randomized so that approximately half of the
participants received Site 1 first and half received Site 2 first. After completing (or at least
attempting) the two tasks on a site, the user was presented with the questionnaire for their
randomly selected condition. Thus, each user completed the same questionnaire for the
two sites. (Technically, “questionnaires” was a between-subjects variable and “sites” was a
within-subjects variable.)
Data Analysis
For each participant, an overall score was calculated for each website by simply averaging
all of the ratings on the questionnaire that was used. (All scales had been coded internally
so that the “better” end corresponded to higher numbers.) Since the various questionnaires
use different scales, these were converted to percentages by dividing each score by the
maximum score possible on that scale. So, for example, a rating of 3 on SUS was
converted to a percentage by dividing that by 5 (the maximum score for SUS), giving a
percentage of 60%.
Special treatment was required for the “Words” condition since it did not involve rating
scales. Before the study, we classified each of the words as being “Positive” (e.g.,
“Convenient”) or “Negative” (e.g., “Unattractive”). (Note that they were not grouped or
identified as such to the participants.) For each participant, an overall score was calculated
by counting the total number of words that person selected and then dividing that number
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 3
into the number of those words that were “Positive”. Thus, if someone selected 8 positive
words and 10 words total, that yielded a score of 80%.
Results
The random assignment of participants to the questionnaire conditions yielded between 19
and 28 participants for each questionnaire. The frequency distributions of their ratings on
each questionnaire for each site, converted to percentages as described above, are shown in
Figures 1 through 5. Figure 6 shows the average scores for each site using each
questionnaire.
SUS
0
10
20
30
40
50
60
20% 40% 60% 80% 100%
Percentage of Maximum Rating
Frequency
Site 1
Site 2
Figure 1. Results using SUS.
QUIS
0
20
40
60
80
100
120
140
160
180
200
20% 40% 60% 80% 100%
Percentage of Maximum Rating
Frequency
Site 1
Site 2
Figure 2. Results using QUIS.
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 4
CSUQ
0
20
40
60
80
100
120
140
160
20% 40% 60% 80% 100%
Percentage of Maximum Rating
Frequency
Site 1
Site 2
Figure 3. Results using CSUQ.
Survey 4: Words
0
2
4
6
8
10
12
20% 40% 60% 80% 100%
Percentage of Maximum Score
Frequency
Site 1
Site 2
Figure 4. Results using Microsoft’s Words
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 5
Our Questionnaire
0
20
40
60
80
100
120
20% 40% 60% 80% 100%
Percentage of Maximum Score
Frequency
Site 1
Site 2
Figure 5. Results using our questionnaire.
Comparison of Means
73% 74% 74% 72%
50% 48% 48%
38%
52%
66%
0%
10%
20%
30%
40%
50%
60%
70%
80%
SUS QUIS CSUQ Words Ours
Survey
Mean Score
Site 1
Site 2
Figure 6. Comparison of mean scores for each site using each questionnaire.
All five questionnaires showed that Site 1 was significantly preferred over Site 2 (p<.01 via
t-test for each). The largest mean difference (74% vs. 38%) was found using the Words
questionnaire, but this was also the questionnaire that yielded the greatest variability in the
responses. Both of these points are apparent from examination of Figure 4, where you can
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 6
see that the modal values for the two sites are at the opposite ends of the scale, but there
are some responses for both sites across the entire range.
The most interesting thing to look at now is what the results would have been using each
questionnaire if the study had been done with a smaller number of participants. We chose
to analyze randomly selected sub-samples of the data at size 6, 8, 10, 12, and 14. We felt
these samples represented sizes commonly used in usability tests. This was an empirical
sub-sampling in which 20 random sub-samples were taken from the full dataset at each of
these different sample sizes, and a t-test was conducted to determine whether the results
showed that Site 1 was significantly better than Site 2 (the conclusion from the full
dataset). Figure 7 shows the results of this random sub-sampling.
% of "Correct" Conclusions
20%
30%
40%
50%
60%
70%
80%
90%
100%
6 8 10 12 14
Sample Size
SUS
QUIS
CSUQ
Words
Ours
Figure 7. Data based on t-tests of random sub-samples of various sizes. Twenty sub-
samples were taken at each sample size for each site and each questionnaire. What is
plotted is the percentage of those 20 tests that yielded the same conclusion as the
analysis of the full dataset (that Site 1 was significantly preferred over Site 2).
As one would expect, the accuracy of the analysis increases as the sample size gets larger.
With a sample size of only 6, all of the questionnaires yield accuracy of only 30-40%,
meaning that 60-70% of the time, at that sample size, you would fail to find a significant
difference between the two sites. Interestingly, the accuracy of some of the questionnaires
increases quicker than others. For example, SUS jumps up to about 75% accuracy at a
sample size of 8, while the others stay down in the 40-55% range. It’s also interesting to
note that most of the questionnaires appear to reach an asymptote at a sample size of 12.
The improvement by going to a sample size of 14 is small in most cases. Also, due to the
different variances of the responses, some of the questionnaires reach a higher asymptote
than others. For example, SUS and CSUQ reach asymptotes of 90-100% while the others
are in the 70-75% range. Of course, the other questionnaires would have continued to
yield improvement if larger samples had been tested.
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 7
Conclusions
First, some caveats need to be pointed out about the interpretation of these data. The
primary one is that they really only directly apply to the analysis of the two sites that we
studied. We selected two popular sites that provide financial information,
finance.Yahoo.com and Kiplinger.com. We chose these sites because they provide similar
kinds of information but in different ways. Had the two sites studied been even more
similar to each other, it would have been more difficult for any of the questionnaires to yield
a significant difference. Likewise, if they had been more different, it would have been easier
for any of the questionnaires to yield a significant difference.
Another caveat is that the users’ assessments of these sites were undoubtedly affected by
the two tasks that we asked them to do on those sites. Again, we did not choose tasks that
we thought would be particularly easier or more difficult on one site vs. the other. We
chose tasks that we thought were typical of the tasks people might want to do on these
kinds of sites.
It’s also possible that the results could have been somewhat different if we had been able to
collect data from more participants using each questionnaire. The minimum number of
participants that we got for any one questionnaire was 19. Some researchers have argued
that still larger numbers of participants are needed to get reliable data from some of these
questionnaires. While that may be true, one of our goals was to study whether any of these
questionnaires yield reliable results at the smaller sample sizes typically seen in usability
tests.
Finally, this paper has only addressed the question of whether a given questionnaire was
able to reliably distinguish between the ratings of one site vs. the other. In many usability
tests, you have only one design that you are evaluating, not two or more that you are
comparing. When evaluating only one design, possibly the most important information is
related to the diagnostic value of the data you get from the questionnaire. In other words,
how well does it help guide improvements to the design? That has not been analyzed in this
study. Interestingly, on the surface at least, it appears that the Microsoft Words might
provide the most diagnostic information, due to the potentially large number of descriptors
involved.
Keeping all of those caveats in mind, it is interesting to note that one of the simplest
questionnaires studied, SUS (with only 10 rating scales), yielded among the most reliable
results across sample sizes. It is also interesting that SUS is the only questionnaire of those
studied whose questions all address different aspects of the user’s reaction to the website
as a whole (e.g., “I found the website unnecessarily complex”, “I felt very confident using
the website”) as opposed to asking the user to assess specific features of the website (e.g.,
visual appearance, organization of information, etc). These results also indicate that, for
the conditions of this study, sample sizes of at least 12-14 participants are needed to get
reasonably reliable results.
REFERENCES
1. Benedek, J., & Miner, T. (2002). Measuring desirability: New methods for evaluating
desirability in a usability lab setting. Proceedings of UPA 2002 Conference, Orlando, FL,
July 8-12, 2002.
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 8
2. Brooke, J. (1996). SUS: A Quick and Dirty Usability Scale. In: P.W. Jordan, B. Thomas,
B.A. Weerdmeester & I.L. McClelland (Eds.), Usability Evaluation in Industry. London:
Taylor & Francis. (Also see http://www.cee.hw.ac.uk/~ph/sus.html)
3. Chin, J. P., Diehl, V. A, & Norman, K. (1988). Development of an instrument measuring
user satisfaction of the human-computer interface, Proceedings of ACM CHI '88
(Washington, DC), pp. 213-218. (Also see
http://www.acm.org/~perlman/question.cgi?form=QUIS and
http://www.lap.umd.edu/QUIS/index.html)
4. Lewis, J. (1995). IBM Computer Usability Satisfaction Questionnaires: Psychometric
Evaluation and Instructions for Use. International Journal of Human-Computer
Interaction, 7 (1,) 1995, 57-78. (Also see
http://www.acm.org/~perlman/question.cgi?form=CSUQ)
5. Perlman, G. (Undated). Web-Based User Interface Evaluation with Questionnaires.
Retrieved from http://www.acm.org/~perlman/question.html on Nov. 7, 2003.
6. WAMMI: http://www.wammi.com
7. RelevantView: http://www.relevantview.com/
8. NetRaker: http://www.netraker.com/
9. Vividence: http://www.vividence.com/
Appendix A: Screenshots of the Five Questionnaires Used
SUS
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 9
QUIS
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 10
CSUQ
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 11
Words (based on Microsoft’s Product Reaction Cards)
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 12
Our Questionnaire
A Comparison of Questionnaires for Assessing Website Usability
... Quantitative measures include the System Usability Scale (SUS) created by John Brooke in 1986 (4), the industry standard for the quantitative assessment of usability (3), technologies (5), and products (6). SUS is a 10question assessment proven reliable (7) and valid in high resource settings (8,9). Scores of more than 68 are considered acceptable or above average. ...
... Real-world testing revealed problems with entering babies brought in dead and prompted making the name field optional for these cases as usually they are not given a name. Six overarching themes relating to the clinical usability of NeoTree were generated, such as confidentiality of identifiable information, cohesion with usual ward processes, embedded education, locally coherent clinical language, adaptability of UI according to available resources, and printout design to facilitate handover (Table 5-themes [7][8][9][10][11][12]. Key examples of changes made to improve the clinical usability of NeoTree included adding a page with common management options and a specific page to facilitate the pathway for surgical babies with gastroschisis/exomphalos (cohesion with usual ward process). ...
Article
Background: Neonatal mortality is high in low-resource settings. NeoTree is a digital intervention for neonatal healthcare professionals (HCPs) aiming to achieve data-driven quality improvement and improved neonatal survival in low-resource hospitals. Optimising usability with end-users could help digital health interventions succeed beyond pilot stages in low-resource settings. Usability is the quality of a user's experience when interacting with an intervention, encompassing their effectiveness, efficiency, and overall satisfaction. Objective: To evaluate the usability and usage of NeoTree beta-app and conduct Agile usability-focused intervention development. Method: A real-world pilot of NeoTree beta-app was conducted over 6 months at Kamuzu Central Hospital neonatal unit, Malawi. Prior to deployment, think-aloud interviews were conducted to guide nurses through the app whilst voicing their thoughts aloud (n = 6). System Usability Scale (SUS) scores were collected before the implementation of NeoTree into usual clinical care and 6 months after implementation (n = 8 and 8). During the pilot, real-world user-feedback and user-data were gathered. Feedback notes were subjected to thematic analysis within an Agile "product backlog." For usage, number of users, user-cadre, proportion of admissions/outcomes recorded digitally, and median app-completion times were calculated. Results: Twelve overarching usability themes generated 57 app adjustments, 39 (68%) from think aloud analysis and 18 (32%) from the real-world testing. A total of 21 usability themes/issues with corresponding app features were produced and added to the app. Six themes relating to data collection included exhaustiveness of data schema, prevention of errors, ease of progression, efficiency of data entry using shortcuts, navigation of user interface (UI), and relevancy of content. Six themes relating to the clinical care included cohesion with ward process, embedded education, locally coherent language, adaptability of user-interface to available resources, and printout design to facilitate handover. SUS scores were above average (88.1 and 89.4 at 1 and 6 months, respectively). Ninety-three different HCPs of 5 cadres, recorded 1,323 admissions and 1,197 outcomes over 6 months. NeoTree achieved 100% digital coverage of sick neonates admitted. Median completion times were 16 and 8 min for admissions and outcomes, respectively. Conclusions: This study demonstrates optimisation of a digital health app in a low-resource setting and could inform other similar usability studies apps in similar settings.
... Elle a été utilisée dans plusieurs de nos études (voir Chapitres 4 et 6. C'est un outil psychométrique simple, gratuit, qui a été utilisé dans le monde entier (Gao, 2019), présentant une validité et une fiabilité élevées (e.g., Bangor et al., 2008 ;Gronier & Baudet, 2021 ;Martins et al., 2015 ;Peres et al., 2013 ;Tullis & Stetson, 2004) et qui peut être adapté à tous types de systèmes, pas seulement numériques (Kortum & Bangor, 2013). Par exemple, cette échelle a été utilisée pour montrer qu'un micro-onde (noté 86,9/100) était jugé comme plus utilisable que l'outil Microsoft ...
Thesis
L’apprentissage de la lecture est une activité complexe qui requiert, au CP, un enseignement explicite et structuré, souvent guidé par une méthode de lecture (i.e., un ensemble d’outils pour l’enseignant et les élèves) éditée. Une équipe pluridisciplinaire, composée d’enseignants, de chercheurs et d’un éditeur (les Éditions Hatier), a choisi de proposer une nouvelle méthode de lecture pour le CP, basée sur les preuves : la méthode Lili CP. Une telle méthode se doit d’être utile (efficace) pour les apprentissages des élèves, mais elle doit aussi être utilisable (facile à prendre en main) et acceptable (compatible avec la classe) pour les enseignants et les élèves, afin de pouvoir être largement adoptée. L’objectif principal de cette thèse était d’évaluer l’utilité, l’utilisabilité et l’acceptabilité de certains outils et de certaines séquences de la méthode en cours de conception, afin d’identifier des pistes concrètes d’amélioration.Notre recherche a débuté par une analyse des pratiques des potentiels futurs utilisateurs. Un questionnaire diffusé à large échelle a mis en évidence la grande diversité des pratiques enseignantes au CP et les principaux critères de choix d’une méthode de lecture. Une étude a révélé l’excellent niveau d’utilisabilité et d’acceptabilité du matériel original d’entrainement à la combinatoire prévu dans Lili CP. Deux interfaces différentes du guide pédagogique au format web ont été comparées, en terme d’utilisabilité et d’acceptabilité également, permettant de dégager la pertinence de certains choix de présentation pour la future méthode. Dans une étude expérimentale, nous avons évalué l’efficacité d’une séquence d’enseignement explicite de la compréhension conçue pour Lili CP, sur les acquis des élèves dans ce domaine. La comparaison à un groupe contrôle actif (i.e., ayant suivi une autre séquence, plus classique, d’enseignement de la compréhension) a démontré l’intérêt de ce type de séquence pour la compétence entrainée.Enfin, deux versions (basique et gamifiée) de l’application numérique ECRIMO, développée pour Lili CP et visant à entrainer l’écriture de mots en autonomie, ont été évaluées sur les trois dimensions d’utilité, d’utilisabilité et d’acceptabilité. L’application, dans ses deux versions, obtient d’excellents scores d’utilisabilité et d’acceptabilité. Les entrainements avec ECRIMO, dans ses deux versions, se sont révélés aussi efficaces qu’un entrainement à l’encodage sous forme d’exercices classiques de dictée dirigés par l’enseignant. Dans tous les groupes entrainés, les progrès en encodage sont plus importants que dans le groupe contrôle et sont visibles surtout chez les élèves ayant déjà un bon niveau d’encodage en début de CP. Enfin, pour ces élèves, la version basique a engendré un progrès plus important que la version gamifiée.Ce travail doctoral apporte une démonstration de la possibilité et de l’intérêt de conduire une évaluation intégrée des outils éducatifs qui doivent être étudiés dans les trois dimensions d’utilité, d’utilisabilité et d’acceptabilité, avant leur diffusion à grande échelle sur le terrain. Il se conclut par la proposition d’une nouvelle démarche intégrée de conception et d’évaluation d’outils pédagogiques.
... Furthermore, it can be adapted for different contexts [54]. When using SUS, reliable results are evident even with small samples [64]. It can be adapted to be used by children [4] in addition to being used as a usability tool for Arabic users with a high degree of reliability [7]. ...
Article
Full-text available
The global spread of COVID-19 has shifted the learning process towards e-learning. In this context, a critical challenge for researchers is to understand and evaluate the effectiveness of e-learning, especially when the learning is adapted to the needs of individual users. In this work we argue that the learner's perception of the level of usability of a system is a valuable metric that gives an insight into the learners' engagement and motivation to learn. Little attention has been paid to this metric. In this paper we explore why this is important and valuable. We present a case-study which uses the System Usability Scale (SUS) questionnaire to measure the user's perception of usability as an indirect (proxy) measure of engagement. A between-subject experiment was conducted with 41 learners with dyslexia. The intervention group used the adaptive version of the e-learning system that matched the material to the needs of the learner. The control group used a standard version. At the end, learning gain and SUS scores were assessed. The correlation between learning performance and the perceived level of usability was positive and moderate (0.517, p < 0.05) among participants in the intervention group. However, learning performance and perceived level of usability were unrelated in the control group (-0.364, p > 0.05). From this, and other work, it appears that using a learner's assessment of the usability of a system is an effective way to measure their attitude to their learning. It reflects their perception of its suitability to their needs and this, in turn, is likely to affect their engagement and motivation. As such, this provides an effective instrument to judge whether adaptation based on learner needs has been successful.
... While UX design and realization are widely addressed in digital reconstruction and visualization discourse, empirical testing methods are rarely described in the research literature on digital reconstruction in visual humanities (e.g., [589,590]). For proving hypotheses on design employed both, standardized questionnaires [591,592] and the observation of test users to observe how individuals cope with a specific task on a product and identify potential weaknesses in user-product interaction. During four studies, we researched how virtually represented architecture is perceived [533]. ...
Article
Full-text available
Digital 3D modelling and visualization technologies have been widely applied to support research in the humanities since the 1980s. Since technological backgrounds, project opportunities, and methodological considerations for application are widely discussed in the literature, one of the next tasks is to validate these techniques within a wider scientific community and establish them in the culture of academic disciplines. This article resulted from a postdoctoral thesis and is intended to provide a comprehensive overview on the use of digital 3D technologies in the humanities with regards to (1) scenarios, user communities, and epistemic challenges; (2) technologies, UX design, and workflows; and (3) framework conditions as legislation, infrastructures, and teaching programs. Although the results are of relevance for 3D modelling in all humanities disciplines, the focus of our studies is on modelling of past architectural and cultural landscape objects via interpretative 3D reconstruction methods.
... The instrument is available at a web source (https://www.usability.gov/how-to-and-tools/methods/systemusability-scale.html). According to the literature, a minimum of 12-14 participants are required to reveal satisfactory results (Tullis & Stetson, 2004;Nielsen & Landauer, 1993). Therefore, we invited 17 participants. ...
Article
Full-text available
Nearly 3.5 billion humans have oral health issues, including dental caries, which requires dentist-patient exposure in oral examinations. The automated approaches identify and locate carious regions from dental images by localizing and processing either colored photographs or X-ray images taken via specialized dental photography cameras. The dentists’ interpretation of carious regions is difficult since the detected regions are masked using solid coloring and limited to a particular dental image type. The software-based automated tools to localize caries from dental images taken via ordinary cameras requires further investigation. This research provided a mixed dataset of dental photographic (colored or X-ray) images, instantiated a deep learning approach to enhance the existing dental image carious regions’ localization procedure, and implemented a full-fledged tool to present carious regions via simple dental images automatically. The instantiation mainly exploits the mixed dataset of dental images (colored photographs or X-rays) collected from multiple sources and pre-trained hybrid Mask RCNN to localize dental carious regions. The evaluations performed by the dentists showed that the correctness of annotated datasets is up to 96%, and the accuracy of the proposed system is between 78% and 92%. Moreover, the system achieved the overall satisfaction level of dentists above 80%.
Chapter
Card sorting is a popular way for creating website information architectures based on users’ mental models. This paper explores the effect of participants’ self-efficacy on card sorting results. A two-phase study was carried out. The first phase involved 40 participants rating their self-efficacy on a standardized scale, followed by an open card sort experiment. The median self-efficacy score was used to split the open card sort data into two groups: one for low and one for high participants’ self-efficacy. These two datasets were analyzed following state-of-the-art techniques for open card sort data analysis, which resulted in two information architectures for the eshop. In the second phase, two functional prototypes were first created for the eshop, one for each information architecture of the first phase. Subsequently, 30 participants interacted with both prototypes in a user testing study. This paper found that users interacting with the information architecture produced by open card sort participants with low self-efficacy made statistically significantly more correct first clicks, significantly less time to find content items, rated the tasks as significantly easier, and provided higher perceived usability ratings compared to when they interacted with the information architecture produced by users with high self-efficacy.
Chapter
Structured AR usability testing in the product development phase enables obtaining early user feedback, allowing companies to focus on UX from the early design phase. Therefore, several studies of usability testing using AR have been conducted, and there have been attempts to perform AR usability testing with remote settings. However, the authentic AR experience has not been provided to the participants due to complicated setup and logistics problems. They revealed many challenges such as different experimental mindsets, communication with participants, a sense of distance from the actual AR experience, and technical issues. This study was aimed to make such a direct comparison between the conventional lab-based (as a control group) usability testing, conventional lab-based AR usability testing, and a remote synchronous AR usability testing method to determine whether typical outcome variables (e.g. performance, satisfaction, accuracy) of usability tests would be affected by the reduced experiment control (e.g. presence of experimenter, user environment). Three different approaches would be conducted to compare the validity and usability for AR.
Chapter
With the spread of COVID-19, Saudi Arabia implemented a policy to keep people physically distanced. Mandatory pandemic-driven mobile technologies have been utilized in the country to limit the spread of the coronavirus. The current study aimed to explore and understand the experience of Arab elderly people with the pandemic-driven mobile app Eatmarna. We conducted a task-based usability test, followed by a satisfaction questionnaire and interview with Arab elderly between 65 and 85 years old. This study has provided an insight into the challenges that were faced by older adults when using pandemic-driven apps. Identification of these challenges contributes to a better understanding of the situation and can lead to appropriate solutions and plans to improve such apps.
Article
Different areas of science that study visual perception suggest the influence of color on human behavior and well-being. However, we know very little about the chromatic effect on satisfaction. The study aimed to assess color influence associated with individuals' age on the product's satisfaction of use. A total of 120 female participants (18–29 and 30–55 years old) evaluated a garlic peeler (Experiment I) and a potato masher (Experiment II) in green, red, and gray/silver colors. The methodology used the System Usability Scale Questionnaire, Emotional responses - FaceReader™ software, and a Preference Scale. We noticed no significant difference between age groups regarding the SUS scores in both experiments. However, the gray product received a higher SUS score (p < 0.05) than the other colors, and the Preference Scale rated gray as the worst. On the other hand, in Experiment I, the emotional responses showed that the main variable was the color, and in Experiment II, it was the participants' age (p < 0.05). Based on the present study, we suggest that color is an important variable on the satisfaction of use. The present research presented laboratory experiments that shed light on the importance of color and age variables on the satisfaction of use with kitchen utensils. Relevance to industry: Color and age driven-knowledge can change how professionals and project teams create their products and user interaction, breaking superficial preconceptions. The present study is part of a research project contributing to the Color and Ergonomics in the Design Industrial field.
Article
Full-text available
Usability does not exist in any absolute sense; it can only be defined with reference to particular contexts. This, in turn, means that there are no absolute measures of usability, since, if the usability of an artefact is defined by the context in which that artefact is used, measures of usability must of necessity be defined by that context too. Despite this, there is a need for broad general measures which can be used to compare usability across a range of contexts. In addition, there is a need for "quick and dirty" methods to allow low cost assessments of usability in industrial systems evaluation. This chapter describes the System Usability Scale (SUS) a reliable, low-cost usability scale that can be used for global assessments of systems usability.
Article
question.cgi is a customizable Web-based perl CGI script to administer and collect data according to a few "standard" user interface evaluation questionnaire forms: * QUIS Questionnaire for User Interface Satisfaction (Chin et al, 1988) * PUEU Perceived Usefulness and Ease of Use (Davis, 1989) * CSUQ Computer System Usability Questionnaire (Lewis, 1995) * ASQ After Scenario Questionnaire (Lewis, 1995) * USE USE Questionnaire (Lund, 2001) * PUTQ Purdue Usability Testing Questionnaire (Lin et al, 1997) as well as Nielsen's attributes of usability, Nielsen's heuristic evaluation, etc.
Article
Difficulty can arise when a practitioner wants to get user input on intangibles such as "desire" and "fun" in a usability lab setting. This paper will introduce you to methods we've created to collect feedback on "desirability" and give some background on how we developed them.
Article
This paper describes recent research in subjective usability measurement at IBM. The focus of the research was the application of psychometric methods to the development and evaluation of questionnaires that measure user satisfaction with system usability. The primary goals of this paper are to (1) discuss the psychometric characteristics of four IBM questionnaires that measure user satisfaction with computer system usability, and (2) provide the questionnaires, with administration and scoring instructions. Usability practitioners can use these questionnaires with confidence to help them measure users' satisfaction with the usability of computer systems.