Conference PaperPDF Available

Investigating the Correspondence Between UMUX-LITE and SUS Scores

Authors:
  • MeasuringU

Abstract

The UMUX-LITE is a two-item questionnaire that assesses perceived usability. In previous research it correlated highly with the System Usability Scale (SUS) and, with appropriate adjustment using a regression formula, had close correspondence to the magnitude of SUS scores, enabling its comparison with emerging SUS norms. Those results, however, were based on the data used to compute the regression formula. In this paper we describe a study conducted to investigate the quality of the published formula using independent data. The formula worked well. As expected, the correlation between the SUS and UMUX-LITE was significant and substantial, and the overall mean difference between their scores was just 1.1, about 1 % of the range of values the questionnaires can take, verifying the efficacy of the regression formula.
... Figure 7 provides a visual representation of the three scales' results. We display the UMUX-Lite as the SUS parity score for comparison [34]. According to the rating scale by Bangor et al. [2], this places all systems in the "good" category for system usability with PromptMap slightly in the lead. ...
... The results are depicted in Figure 8. , NASA-TLX [25] and UMUX-Lite [33]. The latter has been provided as SUS [34] parity score. No significant differences between the conditions were found. ...
Preprint
Full-text available
Recent technological advances popularized the use of image generation among the general public. Crafting effective prompts can, however, be difficult for novice users. To tackle this challenge, we developed PromptMap, a new interaction style for text-to-image AI that allows users to freely explore a vast collection of synthetic prompts through a map-like view with semantic zoom. PromptMap groups images visually by their semantic similarity, allowing users to discover relevant examples. We evaluated PromptMap in a between-subject online study (n=60) and a qualitative within-subject study (n=12). We found that PromptMap supported users in crafting prompts by providing them with examples. We also demonstrated the feasibility of using LLMs to create vast example collections. Our work contributes a new interaction style that supports users unfamiliar with prompting in achieving a satisfactory image output.
... We calculated the SUS-parity score (max = 100) from the collected UMUX-Lite responses [28]. Our system showed good usability [2] withx = 72.4 ...
Preprint
Full-text available
Latent space representations are critical for understanding and improving the behavior of machine learning models, yet they often remain obscure and intricate. Understanding and exploring the latent space has the potential to contribute valuable human intuition and expertise about respective domains. In this work, we present HILL, an interactive framework allowing users to incorporate human intuition into the model training by interactively reshaping latent space representations. The modifications are infused into the model training loop via a novel approach inspired by knowledge distillation, treating the user's modifications as a teacher to guide the model in reshaping its intrinsic latent representation. The process allows the model to converge more effectively and overcome inefficiencies, as well as provide beneficial insights to the user. We evaluated HILL in a user study tasking participants to train an optimal model, closely observing the employed strategies. The results demonstrated that human-guided latent space modifications enhance model performance while maintaining generalization, yet also revealing the risks of including user biases. Our work introduces a novel human-AI interaction paradigm that infuses human intuition into model training and critically examines the impact of human intervention on training strategies and potential biases.
... Usability To evaluate the USABILITY of the conditions we started by transforming the UMUX-Lite questionnaire into SUS Lewis et al. [2015]. We found a significant difference between the conditions (F (1, 106) = 18.845, p < 1e−3). ...
Preprint
Full-text available
Integrated into websites, LLM-powered chatbots offer alternative means of navigation and information retrieval, leading to a shift in how users access information on the web. Yet, predominantly closed-sourced solutions limit proliferation among web hosts and suffer from a lack of transparency with regard to implementation details and energy efficiency. In this work, we propose our openly available agent Talk2X leveraging an adapted retrieval-augmented generation approach (RAG) combined with an automatically generated vector database, benefiting energy efficiency. Talk2X's architecture is generalizable to arbitrary websites offering developers a ready to use tool for integration. Using a mixed-methods approach, we evaluated Talk2X's usability by tasking users to acquire specific assets from an open science repository. Talk2X significantly improved task completion time, correctness, and user experience supporting users in quickly pinpointing specific information as compared to standard user-website interaction. Our findings contribute technical advancements to an ongoing paradigm shift of how we access information on the web.
... The UMUX-Lite score of ARAS using a 7-point Likert scale was 6.50 (SD = 0.35), and the corresponding SUS score, predicted using a regression equation based on the two UMUX-Lite items [46], was 82.48. According to the rating scale by Bangor et al. [6], this places ARAS in the "good" (close to "excellent") category for system usability. ...
Conference Paper
Full-text available
Wearable augmented reality (AR) systems have significant potential to enhance surgical outcomes through in-situ visualization of patient-specific data. Yet, efforts to develop AR-based systems for open surgery have been limited, lacking comprehensive interdisciplinary research and actual clinical evaluations in real surgical environments. Our research addresses this gap by presenting a user-centered design and development process of ARAS, an AR assistance for open pancreatic surgery. ARAS provides in-situ visualization of critical structures, such as the vascular system and the tumor, while offering a robust dual-layer registration method ensuring accurate registration during relevant phases of the surgery. We evaluated ARAS in clinical trials of 20 patients with pancreatic tumors. Accuracy validation and postoperative surgeon interviews confirmed its successful deployment, supporting surgeons in vascular localization and critical decision-making. Our work showcases AR's potential to fundamentally transform procedures for complex surgical operations, advocating a research shift toward ecological validation in open surgery.
... The social workers responded to the usability questions (UMUX-Lite) using a 7-point Likert scale, agreeing that the chatbot's capabilities met expectations toward addressing social needs (average score 5.4, SD 1.1) and that it is easy to use (average score 5.6, SD 1.8). Collectively, they agreed on the usability of chatbot, providing an average score of 72 on the System Usability Scale (calculated using the regression equation developed by Lewis et al [38]). ...
Article
Full-text available
Background Health outcomes are significantly influenced by unmet social needs. Although screening for social needs has become common in health care settings, there is often poor linkage to resources after needs are identified. The structural barriers (eg, staffing, time, and space) to helping address social needs could be overcome by a technology-based solution. Objective This study aims to present the design and evaluation of a chatbot, DAPHNE (Dialog-Based Assistant Platform for Healthcare and Needs Ecosystem), which screens for social needs and links patients and families to resources. Methods This research used a three-stage study approach: (1) an end-user survey to understand unmet needs and perception toward chatbots, (2) iterative design with interdisciplinary stakeholder groups, and (3) a feasibility and usability assessment. In study 1, a web-based survey was conducted with low-income US resident households (n=201). Following that, in study 2, web-based sessions were held with an interdisciplinary group of stakeholders (n=10) using thematic and content analysis to inform the chatbot’s design and development. Finally, in study 3, the assessment on feasibility and usability was completed via a mix of a web-based survey and focus group interviews following scenario-based usability testing with community health workers (family advocates; n=4) and social workers (n=9). We reported descriptive statistics and chi-square test results for the household survey. Content analysis and thematic analysis were used to analyze qualitative data. Usability score was descriptively reported. Results Among the survey participants, employed and younger individuals reported a higher likelihood of using a chatbot to address social needs, in contrast to the oldest age group. Regarding designing the chatbot, the stakeholders emphasized the importance of provider-technology collaboration, inclusive conversational design, and user education. The participants found that the chatbot’s capabilities met expectations and that the chatbot was easy to use (System Usability Scale score=72/100). However, there were common concerns about the accuracy of suggested resources, electronic health record integration, and trust with a chatbot. Conclusions Chatbots can provide personalized feedback for families to identify and meet social needs. Our study highlights the importance of user-centered iterative design and development of chatbots for social needs. Future research should examine the efficacy, cost-effectiveness, and scalability of chatbot interventions to address social needs.
... Its shortness is helpful to avoid overstressing the participants with too many questions. Lewis et al. [18] showed that UMUX-LITE scores can be used to calculate an approximated score for the system usability scale (SUS) [19] as "the correlation between the SUS and UMUX-LITE [is] significant and substantial". Because SUS is an established questionnaire, it has been extensively analyzed. ...
Article
Full-text available
This article describes the design and evaluation of a virtual field trip on the topic of radioactive waste management research for university education. We created an interactive virtual tour through the Mont Terri underground research laboratory by enhancing the virtual experiment information system, designed for domain experts, with background information, illustrations, tasks, tests, and an improved user interface. To put the tour's content into context, a conventional introductory presentation on the final disposal of radioactive waste was added. A user study with 22 participants proved a good perceived usability of the virtual tour and the virtual field trip's ability to transfer knowledge. These results suggest a benefit of employing virtual field trips in geoscientific university courses. In addition, it is conceivable to use the virtual field trip as a tool for science communication in the context of participatory processes during nuclear waste disposal site selection processes.
Article
Full-text available
Intelligent systems, such as chatbots, are likely to strike new qualities of UX that are not covered by instruments validated for legacy human–computer interaction systems. A new validated tool to evaluate the interaction quality of chatbots is the chatBot Usability Scale (BUS) composed of 11 items in five subscales. The BUS-11 was developed mainly from a psychometric perspective, focusing on ranking people by their responses and also by comparing designs’ properties (designometric). In this article, 3186 observations (BUS-11) on 44 chatbots are used to re-evaluate the inventory looking at its factorial structure, and reliability from the psychometric and designometric perspectives. We were able to identify a simpler factor structure of the scale, as previously thought. With the new structure, the psychometric and the designometric perspectives coincide, with good to excellent reliability. Moreover, we provided standardized scores to interpret the outcomes of the scale. We conclude that BUS-11 is a reliable and universal scale, meaning that it can be used to rank people and designs, whatever the purpose of the research.
Article
Full-text available
In 2010, Kraig Finstad published (in this journal) ‘The Usability Metric for User Experience’—the UMUX. The UMUX is a standardized usability questionnaire designed to produce scores similar to the System Usability Scale (SUS), but with 4 rather than 10 items. The development of the questionnaire followed standard psychometric practice. Psychometric evaluation of the final version of the UMUX indicated acceptable levels of reliability (internal consistency), concurrent validity, and sensitivity. Critical review of this research suggests that its weakest element was the structural analysis, which concluded that the UMUX is unidimensional based on insufficient evidence. Mixed-tone item content and parallel analysis of the eigenvalues point to a possible two-factor structure. This weakness, however, is of more theoretical than practical importance, given the overall scale’s apparent reliability, validity, and sensitivity.
Conference Paper
Full-text available
In this paper we present the UMUX-LITE, a two-item questionnaire based on the Usability Metric for User Experience (UMUX) [6]. The UMUX-LITE items are This system's capabilities meet my requirements and This system is easy to use." Data from two independent surveys demonstrated adequate psychometric quality of the questionnaire. Estimates of reliability were .82 and .83 -- excellent for a two-item instrument. Concurrent validity was also high, with significant correlation with the SUS (.81, .81) and with likelihood-to-recommend (LTR) scores (.74, .73). The scores were sensitive to respondents' frequency-of-use. UMUX-LITE score means were slightly lower than those for the SUS, but easily adjusted using linear regression to match the SUS scores. Due to its parsimony (two items), reliability, validity, structural basis (usefulness and usability) and, after applying the corrective regression formula, its correspondence to SUS scores, the UMUX-LITE appears to be a promising alternative to the SUS when it is not desirable to use a 10-item instrument.
Article
Full-text available
Usability does not exist in any absolute sense; it can only be defined with reference to particular contexts. This, in turn, means that there are no absolute measures of usability, since, if the usability of an artefact is defined by the context in which that artefact is used, measures of usability must of necessity be defined by that context too. Despite this, there is a need for broad general measures which can be used to compare usability across a range of contexts. In addition, there is a need for "quick and dirty" methods to allow low cost assessments of usability in industrial systems evaluation. This chapter describes the System Usability Scale (SUS) a reliable, low-cost usability scale that can be used for global assessments of systems usability.
Conference Paper
Full-text available
When designing questionnaires there is a tradition of including items with both positive and negative wording to minimize acquiescence and extreme response biases. Two disadvantages of this approach are respondents accidentally agreeing with negative items (mistakes) and researchers forgetting to reverse the scales (miscoding). The original System Usability Scale (SUS) and an all positively worded version were administered in two experiments (n=161 and n=213) across eleven websites. There was no evidence for differences in the response biases between the different versions. A review of 27 SUS datasets found 3 (11%) were miscoded by researchers and 21 out of 158 questionnaires (13%) contained mistakes from users. We found no evidence that the purported advantages of including negative and positive items in usability questionnaires outweigh the disadvantages of mistakes and miscoding. It is recommended that researchers using the standard SUS verify the proper coding of scores and include procedural steps to ensure error-free completion of the SUS by users. Researchers can use the all positive version with confidence because respondents are less likely to make mistakes when responding, researchers are less likely to make errors in coding, and the scores will be similar to the standard SUS.
Conference Paper
Full-text available
Since its introduction in 1986, the 10-item System Usability Scale (SUS) has been assumed to be unidimensional. Factor analysis of two independent SUS data sets reveals that the SUS actually has two factors - Usability (8 items) and Learnability (2 items). These new scales have reasonable reliability (coefficient alpha of .91 and .70, respectively). They correlate highly with the overall SUS ( r = .985 and .784, respectively) and correlate significantly with one another ( r = .664), but at a low enough level to use as separate scales. A sensitivity analysis using data from 19 tests had a significant Test by Scale interaction, providing additional evidence of the differential utility of the new scales. Practitioners can continue to use the current SUS as is, but, at no extra cost, can also take advantage of these new scales to extract additional information from their SUS data.
Article
Full-text available
The Usability Metric for User Experience (UMUX) is a four-item Likert scale used for the subjective assessment of an application’s perceived usability. It is designed to provide results similar to those obtained with the 10-item System Usability Scale, and is organized around the ISO 9241–11 definition of usability. A pilot version was assembled from candidate items, which was then tested alongside the System Usability Scale during usability testing. It was shown that the two scales correlate well, are reliable, and both align on one underlying usability factor. In addition, the Usability Metric for User Experience is compact enough to serve as a usability module in a broader user experience metric.
Book
You're being asked to quantify your usability improvements with statistics. But even with a background in statistics, you are hesitant to statistically analyze their data, as they are often unsure which statistical tests to use and have trouble defending the use of small test sample sizes. The book is about providing a practical guide on how to solve common quantitative problems arising in usability testing with statistics. It addresses common questions you face every day such as: Is the current product more usable than our competition? Can we be sure at least 70% of users can complete the task on the 1st attempt? How long will it take users to purchase products on the website? This book shows you which test to use, and how provide a foundation for both the statistical theory and best practices in applying them. The authors draw on decades of statistical literature from Human Factors, Industrial Engineering and Psychology, as well as their own published research to provide the best solutions. They provide both concrete solutions (excel formula, links to their own web-calculators) along with an engaging discussion about the statistical reasons for why the tests work, and how to effectively communicate the results. *Provides practical guidance on solving usability testing problems with statistics for any project, including those using Six Sigma practices *Show practitioners which test to use, why they work, best practices in application, along with easy-to-use excel formulas and web-calculators for analyzing data *Recommends ways for practitioners to communicate results to stakeholders in plain English. © 2012 Jeff Sauro and James R. Lewis Published by Elsevier Inc. All rights reserved.
Article
The Usability Metric for User Experience (UMUX) is a four-item Likert scale aimed at replicating the psychometric properties of the System Usability Scale (SUS) in a more compact form. As part of a special issue of the journal Interacting with Computers, the UMUX is being examined in terms of purpose, reliability, validity and structure. This response to commentaries addresses concerns with these issues through updated archival research, deeper analysis on the original data and some updated results with an average-scoring system. The new results show the UMUX performs as expected for a wide range of systems and consists of one underlying usability factor.