Table 3 - uploaded by James R. Lewis
Content may be subject to copyright.
Factor Analysis of the UMUX.
Source publication
In this paper we present the UMUX-LITE, a two-item questionnaire based on the Usability Metric for User Experience (UMUX) [6]. The UMUX-LITE items are This system's capabilities meet my requirements and This system is easy to use." Data from two independent surveys demonstrated adequate psychometric quality of the questionnaire. Estimates of reliab...
Context in source publication
Context 1
... one survey, respondents completed the Positive version of the SUS (n = 402); in the other they completed the Standard version (n = 389). Table 3 shows the results of a factor analysis of the UMUX items combined across the datasets (analyses by dataset showed the same pattern). As predicted in the review of the original UMUX research [8], the UMUX had a clear bidimensional structure with positive-tone items aligning with one factor and negative-tone items aligning with the other, a solution supported by parallel analysis of the eigenvalues [4]. ...
Similar publications
It is well evidenced that Early Maladaptive Schemas (EMS) are important mental health determinants, particularly in adolescents and young adults. The short version of the Young Schema Questionnaire (YSQ-S3) is widely used globally to assess EMS, and has yet to be validated in the Arabic language. The aim of the current study was to validate the Ara...
Unfolding theory maps individuals and stimuli into a common latent space, reflecting preferences through proximity. Unlike traditional psychometric methods, unfolding models are complex and computationally intensive, leading to their underuse despite their potential for nuanced insights. This study presents a tutorial on unidimensional unfolding, u...
Purpose
Emotional eating (EE) refers to eating in response to (negative) emotions. Evidence for the validity of EE is mixed: some meta-analyses find EE only in eating disordered patients, others only in restrained eaters, which suggest that only certain subgroups show EE. Furthermore, EE measures from lab-based assessments, ecological momentary ass...
Aims and background
The main national and international organisms recommend continuous monitoring of psychological distress in cancer patients throughout the disease trajectory. The reasons for this concern are the high prevalence of psychological distress in cancer patients and its association with a worse quality of life, poor adherence to treatm...
The study examined the psychometric properties of the Children’s Emotional Adjustment Scale–Preschool Version (CEAS-P), a new behavioral rating scale completed by parents. The scale measures preschoolers’ emotional functioning across three competency-based factors (Temper control, Social assertiveness, Anxiety control) anchored on healthy emotional...
Citations
... Existing research on chatbot usability and UX covers a broad spectrum of applications, with studies examining factors like task completion, user satisfaction, effectiveness, and specific usability metrics such as the System Usability Scale (SUS) (e.g., [9,14,49,51,54,56,58,[60][61][62][64][65][66]), the Chatbot Usability Questionnaire (CUQ) (e.g., [48,50,53,56,64]), the User Experience Questionnaire (UEQ) (e.g., [47,55,57,60,64]). The Usability Metric for User Experience LITE (UMUX-LITE) [67] offers a quick and reliable measure of usability with a strong correlation to the SUS. Borsci et al. validated the UMUX-LITE tool extensively in the context of chatbot usability, demonstrating its applicability and robustness for evaluating conversational systems and highlighting its advantages as a streamlined alternative to traditional usability scales [11,12]. ...
Qualitative data analysis (QDA) tools are essential for extracting insights from complex datasets. This study investigates researchers’ perceptions of the usability, user experience (UX), mental workload, trust, task complexity, and emotional impact of three tools: Taguette 1.4.1 (a traditional QDA tool), ChatGPT (GPT-4, December 2023 version), and Gemini (formerly Google Bard, December 2023 version). Participants (N = 85), Master’s students from the Faculty of Electrical Engineering and Computer Science with prior experience in UX evaluations and familiarity with AI-based chatbots, performed sentiment analysis and data annotation tasks using these tools, enabling a comparative evaluation. The results show that AI tools were associated with lower cognitive effort and more positive emotional responses compared to Taguette, which caused higher frustration and workload, especially during cognitively demanding tasks. Among the tools, ChatGPT achieved the highest usability score (SUS = 79.03) and was rated positively for emotional engagement. Trust levels varied, with Taguette preferred for task accuracy and ChatGPT rated highest in user confidence. Despite these differences, all tools performed consistently in identifying qualitative patterns. These findings suggest that AI-driven tools can enhance researchers’ experiences in QDA while emphasizing the need to align tool selection with specific tasks and user preferences.
... As a preliminary system evaluation, we aim to assess the usability of the system. We did this using a 5-point user satisfaction assessment and the Usability Metric for User Experience lite (UMUX-lite) [9] evaluation. Participants were recruited via the ORKG ASK production system, via a non-intrusive tooltip asking real-world system users for their opinion. ...
Purpose: Finding scholarly articles is a time-consuming and cumbersome activity, yet crucial for conducting science. Due to the growing number of scholarly articles, new scholarly search systems are needed to effectively assist researchers in finding relevant literature. Methodology: We take a neuro-symbolic approach to scholarly search and exploration by leveraging state-of-the-art components, including semantic search, Large Language Models (LLMs), and Knowledge Graphs (KGs). The semantic search component composes a set of relevant articles. From this set of articles, information is extracted and presented to the user. Findings: The presented system, called ORKG ASK (Assistant for Scientific Knowledge), provides a production-ready search and exploration system. Our preliminary evaluation indicates that our proposed approach is indeed suitable for the task of scholarly information retrieval. Value: With ORKG ASK, we present a next-generation scholarly search and exploration system and make it available online. Additionally, the system components are open source with a permissive license.
... Participants' reactions to the courses and the estimated patient acceptance of structured interviews were evaluated at t3 by means of an online questionnaire, which consisted of 32 selected items ( Table 2) from several instruments [6,[46][47][48][49][50]. There were 8 additional items only administered in the blended learning condition. ...
... VisAWI-S g [49] 0.76 "The layout is professional" f 4 Visual Aesthetics h This item used the German grading system ranging from 1 (excellent) to 6 (insufficient). i Visual analog scale ranging from 0 (not at all satisfied) to 100 (completely satisfied). ...
Background:
Clinical diagnoses determine if and how therapists treat their patients. As misdiagnoses can have severe adverse effects, disseminating evidence-based diagnostic skills into clinical practice is highly important.
Objective:
This study aimed to develop and evaluate a blended learning course in a multicenter cluster randomized controlled trial.
Methods:
Undergraduate psychology students (N=350) enrolled in 18 university courses at 3 universities. The courses were randomly assigned to blended learning or traditional synchronous teaching. The primary outcome was the participants’ performances in a clinical diagnostic interview after the courses. The secondary outcomes were diagnostic knowledge and participants’ reactions to the courses. All outcomes were analyzed on the individual participant level using noninferiority testing.
Results:
Compared with the synchronous course (74.6% pass rate), participation in the blended learning course (89% pass rate) increased the likelihood of successfully passing the behavioral test (odds ratio 2.77, 95% CI 1.55-5.13), indicating not only noninferiority but superiority of the blended learning course. Furthermore, superiority of the blended learning over the synchronous course could be found regarding diagnostic knowledge (β=.13, 95% CI 0.01-0.26), course clarity (β=.40, 95% CI 0.27-0.53), course structure (β=.18, 95% CI 0.04-0.32), and informativeness (β=.19, 95% CI 0.06-0.32).
Conclusions:
Blended learning can help to improve the diagnostic skills and knowledge of (future) clinicians and thus make an important contribution to improving mental health care.
... The well-known PARAdigm for Dialogue System Evaluation framework [21] suggests that the interaction quality with chatbots should be considered as a weighted product of success in achieving the tasks (maximize task success) at an acceptable cost (efficiency and quality of chatbot's performance). However, as acknowledged by Borsci and colleagues [22][23][24] without a specifically designed instrument to measure chatbot user experience, experts cannot reliably compare their results and often use qualitative instruments, or scales developed for point-and-click interaction (e.g., the System Usability Scale [25], Usability metrics for User Experience Lite version (UMUX-Lite) [26]), or scales developed for conversational interfaces-e.g., Speech User Interface Service Quality scale [27] and the Subjective Assessment of Speech System Interfaces [28]. Satisfaction scales like SUS and UMUX are highly reliable (Cronbach's α > 0.7 [29]) and validated measures to capture the quality experience by users after the interaction with interactive systems; nevertheless, these scales were not meant to capture the aspects associated with the textual or verbal dialogical exchanges, i.e., the ability to maintain the sense of the conversation and its context in an efficient and effective [30,31]. ...
... The confirmatory study supported a solution of 5 main factors and 11 items on a 5-point Likert scale from Strongly Disagree to Strongly Agree. The BUS-11 (see Table 2) resulted in a highly reliable inventory (Cronbach's α > 0.9) with a good correlation with classic short scales of satisfaction, i.e., UMUX-Lite [26]. The overall score of the scale is currently calculated by averaging all the items and transforming the result into a percentage. ...
Intelligent systems, such as chatbots, are likely to strike new qualities of UX that are not covered by instruments validated for legacy human–computer interaction systems. A new validated tool to evaluate the interaction quality of chatbots is the chatBot Usability Scale (BUS) composed of 11 items in five subscales. The BUS-11 was developed mainly from a psychometric perspective, focusing on ranking people by their responses and also by comparing designs’ properties (designometric). In this article, 3186 observations (BUS-11) on 44 chatbots are used to re-evaluate the inventory looking at its factorial structure, and reliability from the psychometric and designometric perspectives. We were able to identify a simpler factor structure of the scale, as previously thought. With the new structure, the psychometric and the designometric perspectives coincide, with good to excellent reliability. Moreover, we provided standardized scores to interpret the outcomes of the scale. We conclude that BUS-11 is a reliable and universal scale, meaning that it can be used to rank people and designs, whatever the purpose of the research.
... Participants rated usability and utility after each task using UMUX-LITE [43] and NASA-TLX scales [30] (Appendix C.3.1). Think-aloud data and interviews were transcribed and analyzed through reflexive thematic analysis [10]. ...
... The first scale is a slightly adapted version of the Usability Metric for User Experience-lite (UMUX-lite) [67]. It is based on the well-established System Usability Scale (SUS) [68]. ...
... UMUXlite is also considered suitable for healthcare technology assessment [69]. A corrective regression formula is used to align its scores with those of the SUS [67]. ...
Background
We developed MARVIN, an artificial intelligence (AI)‐based chatbot that provides 24/7 expert‐validated information on self‐management‐related topics for people with HIV. This study assessed (1) the feasibility of using MARVIN, (2) its usability and acceptability, and (3) four usability subconstructs (perceived ease of use, perceived usefulness, attitude towards use, and behavioural intention to use).
Methods
In a mixed‐methods study conducted at the McGill University Health Centre, enrolled participants were asked to have 20 conversations within 3 weeks with MARVIN on predetermined topics and to complete a usability questionnaire. Feasibility, usability, acceptability, and usability subconstructs were examined against predetermined success thresholds. Qualitatively, randomly selected participants were invited to semi‐structured focus groups/interviews to discuss their experiences with MARVIN. Barriers and facilitators were identified according to the four usability subconstructs.
Results
From March 2021 to April 2022, 28 participants were surveyed after a 3‐week testing period, and nine were interviewed. Study retention was 70% (28/40). Mean usability exceeded the threshold (69.9/68), whereas mean acceptability was very close to target (23.8/24). Ratings of attitude towards MARVIN's use were positive (+14%), with the remaining subconstructs exceeding the target (5/7). Facilitators included MARVIN's reliable and useful real‐time information support, its easy accessibility, provision of convivial conversations, confidentiality, and perception as being emotionally safe. However, MARVIN's limited comprehension and the use of Facebook as an implementation platform were identified as barriers, along with the need for more conversation topics and new features (e.g., memorization).
Conclusions
The study demonstrated MARVIN's global usability. Our findings show its potential for HIV self‐management and provide direction for further development.
... An excessive number of items may cause participant fatigue and boredom, thus affecting the quality of responses and completion rates (Lenzner et al., 2010). Within the UX research community, many short questionnaires have been proposed to provide psychometric results consistent with long questionnaires, including UMUX, short version of the Aesthetics Scale and UMUX-LITE, which have two to four questions (Lewis et al., 2013;Wang et al., 2021). Thus, the items in the present questionnaire were streamlined to maintain simplicity. ...
Despite the growing number of cultural applications (apps), they still face challenges in inefficient cultural dissemination, necessitating user acceptance research. In this article, a conceptual cultural app acceptance model is proposed and tested in China (n ¼ 351), using covariance-based structural equation modeling (CB-SEM). The results show that cultural identity (CI) has a significant impact on the perceived usefulness (PU) and the behavioral intention to use the app (BI). Aesthetics (AE) significantly influences CI, PU and perceived ease of use (PEOU) but does not affect BI. Additionally, PU and PEOU have a weak impact on BI. Demographic analysis found significant differences between different genders when examining CI and AE. There are also significant differences in CI, AE, PU, PEOU and BI with or without using experience-based criteria. Our findings can help to understand users' needs and preferences for mobile cultural apps and to promote digital culture.
... As such, we introduced a baseline interaction method through buttons on the handlebar on either side. In a within-subject design with the type of interaction (either button or Head 'n Shoulder) as an independent variable, we measured the usability of the two conditions during a short bike ride of 10 minutes on average using a set of custom questions (see Table 1) and the UMUX-Lite questionnaire [32]. The area for the experiment was closed up from regular traffic, ensuring the participants safety while giving them freedom to select the route. ...
... Participants received a gesture request on average every 10 seconds. After finishing their first run (approximately 10 minutes), the participants completed the questionnaire containing custom questions on perceived bike control and safety (Table 1), as well as the UMUX-Lite questionnaire [32]. In the second bike run, participants were using the other condition and likewise answered the same questions at the end of the run. ...
Distractions caused by digital devices are increasingly causing dangerous situations on the road, particularly for more vulnerable road users like cyclists. While researchers have been exploring ways to enable richer interaction scenarios on the bike, safety concerns are frequently neglected and compromised. In this work, we propose Head 'n Shoulder, a gesture-driven approach to bike interaction without affecting bike control, based on a wearable garment that allows hands- and eyes-free interaction with digital devices through integrated capacitive sensors. It achieves an average accuracy of 97% in the final iteration, evaluated on 14 participants. Head 'n Shoulder does not rely on direct pressure sensing, allowing users to wear their everyday garments on top or underneath, not affecting recognition accuracy. Our work introduces a promising research direction: easily deployable smart garments with a minimal set of gestures suited for most bike interaction scenarios, sustaining the rider's comfort and safety.
... We will only use simulated data to illustrate our approach, as well as dummy products that we name Product A to Product F, for simplicity. All UX questionnaires presented below are publicly available on the internet and a common standard in many software enterprises (see e.g., Laugwitz et al. 2006;Fisher and Kordupleski 2019;Lewis et al. 2013). The article itself targets decision makers as well as developers, designers, and managers. ...
Converting customer survey feedback data into usable insights has always been a great challenge for large software enterprises. Despite the improvements on this field, a major obstacle often remains when drawing the right conclusions out of the data and channeling them into the software development process. In this paper we present a practical end-to-end approach of how to extract useful information out of a data set and leverage the information to drive change. We describe how to choose the right metrics to measure, gather appropriate feedback from customer end-users, analyze the data by leveraging methods from inferential statistics, make the data transparent, analyze large volumes of user comments efficiently with Large Language Models, and finally drive change with the results. Furthermore, we present an example of a UX dashboard that can be used to communicate the analyses to stakeholders within the company.
... Due to the limited time, we had to select short instruments that could be quickly administered. We selected the UMUX-LITE questionnaire to measure usability in two questions (Lewis et al., 2013), and the Net Promoter Score survey to assess user experience in one question (Reichheld, 2007). We chose to administer the instruments orally instead of having clinicians open an online survey; this avoided issues we had experienced in other projects, such as participants not finding the chat window to access a survey link. ...
... We were bound to many emerging constraints, including limited time with clinicians, scheduling changes, and low engagement of the technology team. Given the limited time available with clinicians, we decided to use shorter instruments (UMUX-LITE, NPS) despite our initial inclination to use instruments validated against more contexts and user populations like the system usability scale (SUS; see Lewis et al., 2013). Our nurse team member helped navigate last minute scheduling changes for providers who needed to put patient care first by swiftly rebooking missed evaluation sessions. ...
In this paper, we discuss the user-centered design (UCD) strategies we employed to design an electronic health record (EHR)-integrated application for documentation and tracking of guideline-directed medical therapy for patients with heart failure with reduced ejection fraction. We designed this clinician-facing application by engaging heart failure clinicians, patients and family members, and technology experts. We report on these strategies that, while not uncommon for patient-facing technologies, are less often seen for the design and evaluation of clinician-facing EHR applications. We also provide suggestions for practitioners seeking to build EHR applications and propose avenues for future research.