ArticlePDF Available

A Comparison of Questionnaires for Assessing Website Usability

Authors:

Abstract and Figures

Five questionnaires for assessing the usability of a website were compared in a study with 123 participants. The questionnaires studied were SUS, QUIS, CSUQ, a variant of Microsoft's Product Reaction Cards, and one that we have used in our Usability Lab for several years. Each participant performed two tasks on each of two websites: finance.yahoo.com and kiplinger.com. All five questionnaires revealed that one site was significantly preferred over the other. The data were analyzed to determine what the results would have been at different sample sizes from 6 to 14. At a sample size of 6, only 30-40% of the samples would have identified that one of the sites was significantly preferred. Most of the data reach an apparent asymptote at a sample size of 12, where two of the questionnaires (SUS and CSUQ) yielded the same conclusion as the full dataset at least 90% of the time.
Content may be subject to copyright.
UPA 2004 Presentation—Page 1
A Comparison of Questionnaires for Assessing Website Usability
Thomas S. Tullis and Jacqueline N. Stetson
Human Interface Design Department, Fidelity Center for Applied Technology
Fidelity Investments
82 Devonshire St., V4A
Boston, MA 02109
Contact: tom.tullis@fidelity.com
ABSTRACT:
Five questionnaires for assessing the usability of a website were compared in a study with
123 participants. The questionnaires studied were SUS, QUIS, CSUQ, a variant of
Microsoft’s Product Reaction Cards, and one that we have used in our Usability Lab for
several years. Each participant performed two tasks on each of two websites:
finance.yahoo.com and kiplinger.com. All five questionnaires revealed that one site was
significantly preferred over the other. The data were analyzed to determine what the
results would have been at different sample sizes from 6 to 14. At a sample size of 6, only
30-40% of the samples would have identified that one of the sites was significantly
preferred. Most of the data reach an apparent asymptote at a sample size of 12, where two
of the questionnaires (SUS and CSUQ) yielded the same conclusion as the full dataset at
least 90% of the time.
Introduction
A variety of questionnaires have been used and reported in the literature for assessing the
perceived usability of interactive systems, including QUIS [3], SUS [2], CSUQ [4], and
Microsoft’s Product Reaction Cards [1]. (See [5] for an overview.) In our Usability Lab, we
have been using our own questionnaire for the past several years for assessing subjective
reactions that participants in a usability test had to a web site. However, we had concerns
about the reliability of our questionnaire (and others) given the relatively small number of
participants in most typical usability tests. Consequently, we decided to conduct a study to
determine the effectiveness of some of the standard questionnaires, plus our own, at
various sample sizes. Our focus was specifically on websites.
Method
We decided to limit ourselves to our own questionnaire plus those in the published literature
that we believed could be adapted to evaluating websites. The questionnaires we used were
as follows (illustrated in Appendix A):
1. SUS (System Usability Scale)—This questionnaire, developed at Digital Equipment
Corp., consists of ten questions. It was adapted by replacing the word “system” in
every question with “website”. Each question is a statement and a rating on a five-
point scale of “Strongly Disagree” to “Strongly Agree”.
2. QUIS (Questionnaire for User Interface Satisfaction)—The original questionnaire,
developed at the University of Maryland, was composed of 27 questions. We
dropped three that did not seem to be appropriate to websites (e.g., “Remembering
names and use of commands”). The term “system” was replaced by “website”, and
the term “screen” was generally replaced by “web page”. Each question is a rating
on a ten-point scale with appropriate anchors at each end (e.g., “Overall Reaction to
the Website: Terrible … Wonderful”).
3. CSUQ (Computer System Usability Questionnaire)—This questionnaire, developed at
IBM, is composed of 19 questions. The term “system” or “computer system” was
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 2
replaced by “website”. Each question is a statement and a rating on a seven-point
scale of “Strongly Disagree” to “Strongly Agree”.
4. Words (adapted from Microsoft’s Product Reaction Cards)—This questionnaire is
based on the 118 words used by Microsoft on their Product Reaction Cards [1]. (We
are grateful to Joey Benedek and Trish Miner of Microsoft for providing the complete
list.) Each word was presented with a check-box and the user was asked to choose
the words that best describe their interaction with the website. They were free to
choose as many or as few words as they wished.
5. Our Questionnaire—This is one that we have been using for several years in usability
tests of websites. It is composed of nine statements (e.g., “This website is visually
appealing”) to which the user responds on a seven-point scale from “Strongly
Disagree” to “Strongly Agree”. The points of the scale are numbered -3, -2, -1, 0, 1,
2, 3. Thus, there is an obvious neutral point at 0.
Note that other tools designed as commercial services for evaluating website usability (e.g.,
WAMMI [6], RelevantView [7], NetRaker [8], Vividence [9]) were not included in this study.
Some of these tools use their own proprietary questionnaires and some allow for the
construction of your own.
The entire study was conducted online via our company’s Intranet. A total of 123 of our
employees participated in the study. Each participant was randomly assigned to one of the
five questionnaire conditions. Each was asked to perform two tasks on each of two well-
known personal financial information sites: finance.Yahoo.com and Kiplinger.com. (In the
rest of this paper they will simply be referred to as Site 1 and Site 2. No relationship
between the site numbers and site names should be assumed.) The two tasks were as
follows:
1. Find the highest price in the past year for a share of <company name>. (Note that a
different company was used in each task.)
2. Find the mutual fund with the highest 3-year return.
The order of presentation of the two sites was randomized so that approximately half of the
participants received Site 1 first and half received Site 2 first. After completing (or at least
attempting) the two tasks on a site, the user was presented with the questionnaire for their
randomly selected condition. Thus, each user completed the same questionnaire for the
two sites. (Technically, “questionnaires” was a between-subjects variable and “sites” was a
within-subjects variable.)
Data Analysis
For each participant, an overall score was calculated for each website by simply averaging
all of the ratings on the questionnaire that was used. (All scales had been coded internally
so that the “better” end corresponded to higher numbers.) Since the various questionnaires
use different scales, these were converted to percentages by dividing each score by the
maximum score possible on that scale. So, for example, a rating of 3 on SUS was
converted to a percentage by dividing that by 5 (the maximum score for SUS), giving a
percentage of 60%.
Special treatment was required for the “Words” condition since it did not involve rating
scales. Before the study, we classified each of the words as being “Positive” (e.g.,
“Convenient”) or “Negative” (e.g., “Unattractive”). (Note that they were not grouped or
identified as such to the participants.) For each participant, an overall score was calculated
by counting the total number of words that person selected and then dividing that number
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 3
into the number of those words that were “Positive”. Thus, if someone selected 8 positive
words and 10 words total, that yielded a score of 80%.
Results
The random assignment of participants to the questionnaire conditions yielded between 19
and 28 participants for each questionnaire. The frequency distributions of their ratings on
each questionnaire for each site, converted to percentages as described above, are shown in
Figures 1 through 5. Figure 6 shows the average scores for each site using each
questionnaire.
SUS
0
10
20
30
40
50
60
20% 40% 60% 80% 100%
Percentage of Maximum Rating
Frequency
Site 1
Site 2
Figure 1. Results using SUS.
QUIS
0
20
40
60
80
100
120
140
160
180
200
20% 40% 60% 80% 100%
Percentage of Maximum Rating
Frequency
Site 1
Site 2
Figure 2. Results using QUIS.
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 4
CSUQ
0
20
40
60
80
100
120
140
160
20% 40% 60% 80% 100%
Percentage of Maximum Rating
Frequency
Site 1
Site 2
Figure 3. Results using CSUQ.
Survey 4: Words
0
2
4
6
8
10
12
20% 40% 60% 80% 100%
Percentage of Maximum Score
Frequency
Site 1
Site 2
Figure 4. Results using Microsoft’s Words
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 5
Our Questionnaire
0
20
40
60
80
100
120
20% 40% 60% 80% 100%
Percentage of Maximum Score
Frequency
Site 1
Site 2
Figure 5. Results using our questionnaire.
Comparison of Means
73% 74% 74% 72%
50% 48% 48%
38%
52%
66%
0%
10%
20%
30%
40%
50%
60%
70%
80%
SUS QUIS CSUQ Words Ours
Survey
Mean Score
Site 1
Site 2
Figure 6. Comparison of mean scores for each site using each questionnaire.
All five questionnaires showed that Site 1 was significantly preferred over Site 2 (p<.01 via
t-test for each). The largest mean difference (74% vs. 38%) was found using the Words
questionnaire, but this was also the questionnaire that yielded the greatest variability in the
responses. Both of these points are apparent from examination of Figure 4, where you can
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 6
see that the modal values for the two sites are at the opposite ends of the scale, but there
are some responses for both sites across the entire range.
The most interesting thing to look at now is what the results would have been using each
questionnaire if the study had been done with a smaller number of participants. We chose
to analyze randomly selected sub-samples of the data at size 6, 8, 10, 12, and 14. We felt
these samples represented sizes commonly used in usability tests. This was an empirical
sub-sampling in which 20 random sub-samples were taken from the full dataset at each of
these different sample sizes, and a t-test was conducted to determine whether the results
showed that Site 1 was significantly better than Site 2 (the conclusion from the full
dataset). Figure 7 shows the results of this random sub-sampling.
% of "Correct" Conclusions
20%
30%
40%
50%
60%
70%
80%
90%
100%
6 8 10 12 14
Sample Size
SUS
QUIS
CSUQ
Words
Ours
Figure 7. Data based on t-tests of random sub-samples of various sizes. Twenty sub-
samples were taken at each sample size for each site and each questionnaire. What is
plotted is the percentage of those 20 tests that yielded the same conclusion as the
analysis of the full dataset (that Site 1 was significantly preferred over Site 2).
As one would expect, the accuracy of the analysis increases as the sample size gets larger.
With a sample size of only 6, all of the questionnaires yield accuracy of only 30-40%,
meaning that 60-70% of the time, at that sample size, you would fail to find a significant
difference between the two sites. Interestingly, the accuracy of some of the questionnaires
increases quicker than others. For example, SUS jumps up to about 75% accuracy at a
sample size of 8, while the others stay down in the 40-55% range. It’s also interesting to
note that most of the questionnaires appear to reach an asymptote at a sample size of 12.
The improvement by going to a sample size of 14 is small in most cases. Also, due to the
different variances of the responses, some of the questionnaires reach a higher asymptote
than others. For example, SUS and CSUQ reach asymptotes of 90-100% while the others
are in the 70-75% range. Of course, the other questionnaires would have continued to
yield improvement if larger samples had been tested.
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 7
Conclusions
First, some caveats need to be pointed out about the interpretation of these data. The
primary one is that they really only directly apply to the analysis of the two sites that we
studied. We selected two popular sites that provide financial information,
finance.Yahoo.com and Kiplinger.com. We chose these sites because they provide similar
kinds of information but in different ways. Had the two sites studied been even more
similar to each other, it would have been more difficult for any of the questionnaires to yield
a significant difference. Likewise, if they had been more different, it would have been easier
for any of the questionnaires to yield a significant difference.
Another caveat is that the users’ assessments of these sites were undoubtedly affected by
the two tasks that we asked them to do on those sites. Again, we did not choose tasks that
we thought would be particularly easier or more difficult on one site vs. the other. We
chose tasks that we thought were typical of the tasks people might want to do on these
kinds of sites.
It’s also possible that the results could have been somewhat different if we had been able to
collect data from more participants using each questionnaire. The minimum number of
participants that we got for any one questionnaire was 19. Some researchers have argued
that still larger numbers of participants are needed to get reliable data from some of these
questionnaires. While that may be true, one of our goals was to study whether any of these
questionnaires yield reliable results at the smaller sample sizes typically seen in usability
tests.
Finally, this paper has only addressed the question of whether a given questionnaire was
able to reliably distinguish between the ratings of one site vs. the other. In many usability
tests, you have only one design that you are evaluating, not two or more that you are
comparing. When evaluating only one design, possibly the most important information is
related to the diagnostic value of the data you get from the questionnaire. In other words,
how well does it help guide improvements to the design? That has not been analyzed in this
study. Interestingly, on the surface at least, it appears that the Microsoft Words might
provide the most diagnostic information, due to the potentially large number of descriptors
involved.
Keeping all of those caveats in mind, it is interesting to note that one of the simplest
questionnaires studied, SUS (with only 10 rating scales), yielded among the most reliable
results across sample sizes. It is also interesting that SUS is the only questionnaire of those
studied whose questions all address different aspects of the user’s reaction to the website
as a whole (e.g., “I found the website unnecessarily complex”, “I felt very confident using
the website”) as opposed to asking the user to assess specific features of the website (e.g.,
visual appearance, organization of information, etc). These results also indicate that, for
the conditions of this study, sample sizes of at least 12-14 participants are needed to get
reasonably reliable results.
REFERENCES
1. Benedek, J., & Miner, T. (2002). Measuring desirability: New methods for evaluating
desirability in a usability lab setting. Proceedings of UPA 2002 Conference, Orlando, FL,
July 8-12, 2002.
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 8
2. Brooke, J. (1996). SUS: A Quick and Dirty Usability Scale. In: P.W. Jordan, B. Thomas,
B.A. Weerdmeester & I.L. McClelland (Eds.), Usability Evaluation in Industry. London:
Taylor & Francis. (Also see http://www.cee.hw.ac.uk/~ph/sus.html)
3. Chin, J. P., Diehl, V. A, & Norman, K. (1988). Development of an instrument measuring
user satisfaction of the human-computer interface, Proceedings of ACM CHI '88
(Washington, DC), pp. 213-218. (Also see
http://www.acm.org/~perlman/question.cgi?form=QUIS and
http://www.lap.umd.edu/QUIS/index.html)
4. Lewis, J. (1995). IBM Computer Usability Satisfaction Questionnaires: Psychometric
Evaluation and Instructions for Use. International Journal of Human-Computer
Interaction, 7 (1,) 1995, 57-78. (Also see
http://www.acm.org/~perlman/question.cgi?form=CSUQ)
5. Perlman, G. (Undated). Web-Based User Interface Evaluation with Questionnaires.
Retrieved from http://www.acm.org/~perlman/question.html on Nov. 7, 2003.
6. WAMMI: http://www.wammi.com
7. RelevantView: http://www.relevantview.com/
8. NetRaker: http://www.netraker.com/
9. Vividence: http://www.vividence.com/
Appendix A: Screenshots of the Five Questionnaires Used
SUS
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 9
QUIS
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 10
CSUQ
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 11
Words (based on Microsoft’s Product Reaction Cards)
A Comparison of Questionnaires for Assessing Website Usability
UPA 2004 Presentation—Page 12
Our Questionnaire
A Comparison of Questionnaires for Assessing Website Usability
... Additionally, extensive application studies have shown that CSUQ is sensitive to many independent variables, including the number of years of use of a computer system, the type of computer used, the range of experience with different computers, and groups of users with different experiences (Berkman & Karahoca, 2016). Moreover, CSUQ has the second fastest sample mean convergence rate when analyzing a large sample (Tullis & Stetson, 2004). It is also easier to conduct in non-laboratory experiments, which is advantage in the context of the ongoing COVID-19 pandemic. ...
... SUS and CSUQ are the most widely used scales, and as a result, they have garnered the interest of many researchers. For example, Tullis and Stetson (2004) tested websites and found that SUS and CSUQ reached the same conclusion in terms of the full dataset to at least 90%. Alhadreti (2021) examined the usability of the blackboard and explored the relationship between SUS and CSUQ; this study demonstrated a good correspondence between the two questionnaires. ...
... CSUQ with high reliability, and it also achieves fast convergence of the sample mean (Tullis & Stetson, 2004). After the items with low contribution factors were removed from the psychometric assessment of PSSUQ, the current version (version 2) contains 16 items (Sauro & Lewis, 2014). ...
Article
Full-text available
The Computer System Usability Questionnaire is one of the most popular standardized usability questionnaires used to assess the perceived usability of computer software, and it has been rigorously translated into multiple languages. This article aims to complete the cross-cultural adaptation of CSUQ in a Chinese environment and examine the psychometric characteristics. We used the forward-backward translation method to complete the initial translation, and designed the psychometric experiment according to the measurement properties required by the consensus-based standards for the selection of health measurement instruments (COSMIN). The COSMIN risk of bias checklist and updated criteria for good measurement properties were used to examine the Chinese CSUQ. The results demonstrated good performance of the Chinese CSUQ in content validity , internal structure and the remaining measurement properties. The results of the factor analysis revealed a three-factor structure, as well as any discrepancy between the Chinese CSUQ and its original version. Therefore, the Chinese CSUQ can be used confidently by Chinese usability practitioners and also serve as an effective alternative to other standardized usability scales.
... Since the year it was presented, several studies showed that SUS is a reliable and valid scale for capturing users' opinions about any system [72,73,75] even with relatively low sample size [76][77] ...
... provided scientific evidence that SUS can provide reliable evaluation of perceived usability even with a sample size between 6-14 participants [76]. In another study, Tullis and Stetson demonstrated that even with a very small sample of 8-12 people, the System Usability Scale (SUS) could provide robust measurements of what people think about the usability of a particular system or product [77]. They showed that SUS yielded highly reliable results regarding the perceived usability of the relevant system or product. ...
Article
Full-text available
INTRODUCTION: Notetaking is considered, by many educators, as one of the critical actions of learning. There are several note-taking methods and approaches. Based on these methods and approaches, various applications, whether mobile, desktop or -Web-based, were developed.OBJECTIVES: In this paper, a novel note-taking application based on Cornell Technique, is presented. Its development process and user acceptance trend are exhibited and results for user evaluation based on user satisfaction are presented. METHODS: For the software development process, Incremental Model was adopted. Requirement Analysis included, aside from examining principles and related note-taking structure of Cornell Technique, investigating (i) how to perform notetaking as an activity of learning, (ii) its product and (iii) relationship of notes for the purpose of storage. Models containing sub-activities, such as reviewing note have been identified and some were selectively adopted and related functions such as review alert (tickler) and collaboration on notetaking have been implemented. To the purpose of storage, a tree-based scheme called collection was modelled. User interfaces were first designed as mockups and click-through pro-totype using Adobe XD. The mobile application was implemented in Dart programming language. Google’s Firebase Service and Flutter Framework was adopted. The mobile application was compared with its equivalents in the Google Play Store and user statistics were investigated. To evaluate perceived usability, the System Usability Scale is adopted and applied to 14 university students conforming to determined persona.RESULTS: The application has been published in Google Play Store for users to install for free on 18th March 2022. As of 10th September 2023, total number of downloads is 5K and the Cornell Note mobile app is currently installed on 1108 devices. For the last three-month period (from 11th June to 10th September 2023), the active users per month changed in an increasing trend from 450 to 589. The average engagement time on 11th of April 2023 was 28 minutes 00 seconds. As the number of monthly active users increased, the average engagement time measured on 10th September 2023 decreased to 23 minutes 31 seconds. However, engagement rates measured were 76.91% and 77.19%, respectively. The mean SUS score was found to be equal to 79.5.CONCLUSION: The user statistics and comparison with equivalent mobile applications reveal that Cornell Note has potential to grow as a mobile application for notetaking since it has a good perceived usability, however, there is room for improvement. Considering any extra marketing effort was not spent for the application such as application store optimization, the statistics are another evidence for user appeal and acceptance. However, it is important to add new functionality without complicating the user experience so that user appeal and acceptance boosts.
... Among standardised questionnaires, the SUS was the most popular, with attributes of quick administration, reliability, and wide applicability for both small and large sample sizes (Tullis and Stetson, 2004;Brooke, 2013). It is also relevant and commonly used for comparing two different versions of a system based on how each is perceived in terms of usability (Brooke, 2013). ...
Article
Full-text available
Abstract Context: Digital Health (DH) is widely considered essential for sustainable future healthcare systems. Software quality, particularly usability, is crucial for the success and adoption of most DH products. However, concerns about the effectiveness and efficiency of usability evaluation of DH products have been raised. Objective: This article aims to analyse the prevalence and application contexts of usability evaluation methods in DH and to highlight potential issues related to their effectiveness and efficiency. Method: A systematic literature review of usability evaluation studies, published by (academic) practitioners between 2016 and April 2023, was conducted. 610 primary articles were identified and analysed, utilising five major scientific databases. Results: Our findings show a preference for inquiry (85%) and testing (63%) methods, with inspection used less frequently (17%). The published studies employed methods like questionnaires (75%); notably the SUS (49%), semi-structured interviews (25%), and heuristic evaluations (73%), with percentages based on their group. Data collection mainly involved the use of participant feedback (45%), audio/video recordings (44%), and system logs (20%), with both qualitative and quantitative data analyses prevalent in studies. However, several usability characteristics such as accessibility, memorability, and operability were found to be largely overlooked, and automation tools or platforms were not widely used. Among the systems evaluated were mHealth applications (70%), telehealth platforms (36%), health information technology (HIT) solutions (29%), personalized medicine (Per. Med.) (17%), wearable devices (12%), and digital therapeutics (DTx) interventions (6%), with the participation of general users, patients, healthcare providers, and informal caregivers varying based on the health condition studied. Furthermore, insights and experiences gathered from 24 articles underscored the importance of a mixed-method approach in usability evaluations, the limitations of traditional methods, the necessity for sector-specific customisation, and the potential benefits of remote usability studies. Moreover, while eye-tracking emerged as a promising evaluation technique, careful execution and interpretation are crucial to avoid data misinterpretation. Conclusion: The study’s findings showed that employing a combination of inquiry and testing-based methods is prevalent for evaluating DH platforms. Despite an array of DH systems, method distribution remained consistent across platforms and targeted user groups. The study also underlines the importance of involving target user groups in the process. Potentially affected cognitive abilities of participants and potential user groups of interest have to be taken into account when choosing evaluation methods, and methods might therefore need to be tailored. Complementary inspection methods might be particularly useful when recruiting representative participants is difficult. Several potential paths for future research are outlined, such as exploring novel technologies like artificial intelligence, for improved automation tool support in the usability evaluation process.
... Atualmente, o questionário padronizado mais amplamente utilizado para avaliação da usabilidadeé o System Usability Scale (SUS) [1]. Além disso,é o questionário que apresenta os resultados mais confiáveis, em todos os tamanhos de amostra [27], mesmo com um número relativamente pequeno de participantes [28]. ...
Article
Full-text available
A crescente utilização de aplicativos móveis tem impulsionado a busca por soluções que otimizem tarefas cotidianas com maior praticidade e eficiência. Nesse contexto, este trabalho visa melhorar o processo de reservas de espaços acadêmicos, como salas e laboratórios, por meio da criação do aplicativo GetLab. Diferentemente dos sistemas já existentes, que são limitados à plataforma Web, o GetLab foi desenvolvido para dispositivos móveis (Android e iOS) e incorpora um agente conversacional capaz de recomendar salas com base nas preferências dos usuários. Como avaliação do primeiro protótipo funcional deste aplicativo junto à comunidade acadêmica, foi conduzido um grupo focal e um teste de usabilidade, cujos resultados obtidos atestam sua eficácia e um bom nível de satisfação dos usuários.
... The first part of the questionnaire contains the ten questions of the System Usability Scale (SUS) questionnaire. SUS is considered one of the most effective questionnaires in terms of the validity and of reliability the results produced [68,69]. ...
Article
Full-text available
Over the years, various software quality measurement models have been proposed and used in academia and the software industry to assess the quality of produced code and to obtain guidelines for its improvement. In this article, we describe the design and functionality of SQMetrics, a tool for calculating object-oriented quality metrics for projects written in Java. SQMetrics provides the convenience of measuring small code, mainly covering academic or research needs. In this context, the application can be used by students of software engineering courses to make measurements and comparisons in their projects and gradually increase their quality by improving the calculated metrics. Teachers, on the other hand, can use SQMetrics to evaluate students’ Java projects and grade them in proportion to their quality. The contribution of the proposed tool is three-fold, as it has been: (a) tested for its completeness and functionality by comparing it with widely known similar tools, (b) evaluated for its usability and value as a learning aid by students, and (c) statistically tested for its value as a teachers’ aid assisting in the evaluation of student projects. Our findings verify SQMetrics’ effectiveness in helping software engineering students learn critical concepts and improve the quality of their code, as well as in helping teachers assess the quality of students’ Java projects and make more informed grading decisions.
Article
National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence (BIGAI), China Retrieving out-of-reach objects is a crucial task in virtual reality (VR). One of the most commonly used approaches for this task is the gesture-based approach, which allows for bare-hand, eyes-free, and direct retrieval. However, previous work has primarily focused on assigned gesture design, neglecting the context. This can make it challenging to accurately retrieve an object from a large number of objects due to the one-to-one mapping metaphor, limitations of finger poses, and memory burdens. There is a general consensus that objects and contexts are related, which suggests that the object expected to be retrieved is related to the context, including the scene and the objects with which users interact. As such, we propose a commonsense knowledge-driven joint reasoning approach for object retrieval, where human grasping gestures and context are modeled using an And-Or graph (AOG). This approach enables users to accurately retrieve objects from a large number of candidate objects by using natural grasping gestures based on their experience of grasping physical objects. Experimental results demonstrate that our proposed approach improves retrieval accuracy. We also propose an object retrieval system based on the proposed approach. Two user studies show that our system enables efficient object retrieval in virtual environments (VEs).
Chapter
The objective of this study is to compare the interfaces of two FURS (Flexible Ureterorenoscopy) robotic operation consoles across generations. Analyzing functional requirements during surgery, the first-generation interface was designed. Evaluation of the first-generation interface, combined with user needs and design principles, informed the design of the second-generation interface, which aimed to improve the usability of the console during surgery. Both interface generations were assessed for usability and user experience using the System Usability Scale (SUS).The results indicate improvement in usability and user satisfaction following interface iteration.
Article
Full-text available
This study is a part of a research effort to develop the Questionnaire for User Interface Satisfaction (QUIS). Participants, 150 PC user group members, rated familiar software products. Two pairs of software categories were compared: 1) software that was liked and disliked, and 2) a standard command line system (CLS) and a menu driven application (MDA). The reliability of the questionnaire was high, Cronbach’s alpha=.94. The overall reaction ratings yielded significantly higher ratings for liked software and MDA over disliked software and a CLS, respectively. Frequent and sophisticated PC users rated MDA more satisfying, powerful and flexible than CLS. Future applications of the QUIS on computers are discussed.
Article
Full-text available
Usability does not exist in any absolute sense; it can only be defined with reference to particular contexts. This, in turn, means that there are no absolute measures of usability, since, if the usability of an artefact is defined by the context in which that artefact is used, measures of usability must of necessity be defined by that context too. Despite this, there is a need for broad general measures which can be used to compare usability across a range of contexts. In addition, there is a need for "quick and dirty" methods to allow low cost assessments of usability in industrial systems evaluation. This chapter describes the System Usability Scale (SUS) a reliable, low-cost usability scale that can be used for global assessments of systems usability.
Article
Full-text available
This paper describes recent research in subjective usability measurement at IBM. The focus of the research was the application of psychometric methods to the development and evaluation of questionnaires that measure user satisfaction with system usability. The primary goals of this paper are to (1) discuss the psychometric characteristics of four IBM questionnaires that measure user satisfaction with computer system usability, and (2) provide the questionnaires, with administration and scoring instructions. Usability practitioners can use these questionnaires with confidence to help them measure users' satisfaction with the usability of computer systems.
Article
question.cgi is a customizable Web-based perl CGI script to administer and collect data according to a few "standard" user interface evaluation questionnaire forms: * QUIS Questionnaire for User Interface Satisfaction (Chin et al, 1988) * PUEU Perceived Usefulness and Ease of Use (Davis, 1989) * CSUQ Computer System Usability Questionnaire (Lewis, 1995) * ASQ After Scenario Questionnaire (Lewis, 1995) * USE USE Questionnaire (Lund, 2001) * PUTQ Purdue Usability Testing Questionnaire (Lin et al, 1997) as well as Nielsen's attributes of usability, Nielsen's heuristic evaluation, etc.
Article
Difficulty can arise when a practitioner wants to get user input on intangibles such as "desire" and "fun" in a usability lab setting. This paper will introduce you to methods we've created to collect feedback on "desirability" and give some background on how we developed them.