ChapterPDF Available

Usability Testing

Authors:
  • MeasuringU

Abstract

Covers the basics of usability testing plus some statistical topics (sample size estimation, confidence intervals, and standardized usability questionnaires).
... Por otra parte, los tests de usabilidad, según Lewis (2012) y Riihiaho (2017), son métodos esenciales para evaluar la funcionalidad y la experiencia del usuario en aplicaciones web. Estas pruebas implican la observación directa de usuarios reales mientras realizan tareas específicas o expresan sus pensamientos durante el proceso. ...
... El autor Wichansky (2010) destaca la utilidad de la aplicación de los tests para medir el rendimiento del usuario y la satisfacción frente a un sistema o producto de software, se recopila información valiosa sobre las dificultades de usabilidad y se proponen soluciones para mejorar la experiencia del usuario. Lewis (2012), del mismo modo que Nelson y Stavrou (2011) destacan las ventajas de los tests de usabilidad, como la interacción directa con la aplicación, la identificación y solución de problemas de usabilidad, la obtención de datos útiles y la corrección de fallas para mejorar la experiencia del usuario. Sin embargo, señalan algunas desventajas, como la posibilidad de que los usuarios no realicen las pruebas con seriedad, la dificultad para detectar distracciones en un entorno no controlado y la falta de seguimiento de las instrucciones por parte de los usuarios. ...
Article
El desarrollo de la investigación científica ha experimentado un notable crecimiento en la actualidad, lo que ha dado lugar al surgimiento de aplicaciones digitales dirigidas a la preservación de corpus in- vestigativos, como es el caso del DYAC desarrollado por la Universidad del Azuay. En este contexto, el objetivo de este estudio fue evaluar la usabilidad de la aplicación web DYAC en dos grupos de usuarios: investigadores y usuarios registrados. Se llevó a cabo un estudio de caso que involucró a 68 participantes, entre los cuales se encontraban 59 estudiantes y 9 docentes-investigadores, a quienes se les aplicaron casos de prueba y la encuesta WAMMI. Los resultados de los casos de prueba revelaron comentarios de los usuarios acerca de las dificultades de usabilidad en la aplicación en cuanto a la navegación, el diseño de in- terfaz, la búsqueda, la arquitectura, la funcionalidad y la facilidad de aprendizaje, así mismo, aspectos a mejorar relacionado a la semántica de etiquetas, el tamaño de letra, el uso de colores y arquitectura. Por otro lado, la encuesta WAMMI arrojó una puntuación global de usabilidad del 66.09 indicativo de calificación satisfactoria. Se concluye que las pruebas de usabilidad proporcionan información relevante para esta investigación, ya que sirven como guía para realizar mejoras significativas en la aplicación. Keywords: Caso de prueba, DYAC, estudio de caso, encuesta WAMMI, test de usabilidad, usabilidad DYAC. AbstractThe development of scientific research has currently experienced notable growth, which has given rise to the emergence of digital applications aimed at investigative corporas preservation, as is the case of the DYAC developed by the University of Azuay.In this context, the objective of this study was to evaluate the usability of the DYAC web application in two groups of users: researchers and registered users. A case study was carried out that involved 68 participants, among whom were 59 students and 9 teacher-researchers, to whom test cases and a WAMMI survey were applied. The results of the test cases revealed user comments about usability difficulties in the application. On the other hand, the WAMMI survey showed an overall usability score of 66.09. It is concluded that usability tests provide important information for this research since they serve as a guide to make significant improvements tothe application. Palabras clave: Case study, DYAC, test cases, usability testing, usability DYAC, WAMMI survey.
... The usability testing method (Lewis, 2012) was employed, which involves observers monitoring the performance of participants as they interact with a tool-in this case, ChatGPT 3.5. The study involved evaluations by experts with extensive experience in generating reading comprehension questions aligned with the PIRLS 2011 framework. ...
Article
Full-text available
This study examined the use of ChatGPT 3.5 for reading questions generation based on four processes of comprehension of PIRLS 2011 assessment framework. Using an instrumental case study approach and a usability testing method, we employed an input story text to assess ChatGPT 3.5’s effectiveness. A total of twenty questions were generated and evaluated using specific criteria. We analyzed the content of the obtained questions and the quality of revised questions after further adjustment instructions. Findings reveal that ChatGPT 3.5 excels at generating factual questions, leveraging explicit details from text, and demonstrates improvement in interpret-and-integrate questions when detailed instructions are provided. However, it shows significant limitations with higher-order questions requiring inference and evaluation, where contextual accuracy and deeper comprehension are critical. Statistical analysis revealed significant differences between question types (p = 0.004). Thematic analysis highlighted recurring challenges, such as content misalignment, oversimplification, and difficulty processing complex or nuanced user requirements. Tips for users on potential issues and ethical concerns were also discussed. Despite limitations, ChatGPT 3.5 remains a valuable tool for educators and students to enhance question creation productivity. These findings contribute to understanding generative AI’s role in education and provide actionable insights for improving question generation efficiency.
Article
Full-text available
Objective The objectives encompassed (1) the creation of Recuerdame, a digital app specifically designed for occupational therapists, aiming to support these professionals in the processes of planning, organizing, developing, and documenting reminiscence therapies for older people with dementia, and (2) the evaluation of the designed prototype through a participatory and user-centered design approach, exploring the perceptions of end-users. Methods This exploratory research used a mixed-methods design. The app was developed in two phases. In the first phase, the research team identified the requirements and designed a prototype. In the second phase, experienced occupational therapists evaluated the prototype. Results The research team determined the app's required functionalities, grouped into eight major themes: register related persons and caregivers; record the patient's life story memories; prepare a reminiscence therapy session; conduct a session; end a session; assess the patient; automatically generate a life story; other requirements. The first phase ended with the development of a prototype. In the second phase, eight occupational therapists performed a series of tasks using all the application's functionalities. Most of these tasks were very easy (Single Ease Question). The level of usability was considered excellent (System Usability Scale). Participants believed that the app would save practitioners time, enrich therapy sessions and improve their effectiveness. The qualitative results were summarized in two broad themes: (a) acceptability of the app; and (b) areas for improvement. Conclusions Participating occupational therapists generally agreed that the co-designed app appears to be a versatile tool that empowers these professionals to manage reminiscence interventions.
Article
The Bengkulu Selatan Regency Government has developed e-government by providing various information channels through websites throughout the Regional Apparatus Organizations (OPD), in this case, the Housing and Settlement Service. Usability assessment is needed to support further development and determine the value of the benefits of e-government to meet the community's needs. There are 4 aspects evaluated in this study, namely 4 aspects: usability, System Usefulness, Information Quality, Interface Quality, and Overall. This study uses the Post Study System Usability Questionnaire (PSSUQ) method, namely a questionnaire used to assess a system or to determine the extent of user experience to get a comprehensive impression of the user experience or end user of a system with a case study of the website of the Housing and Settlement Service of Bengkulu Selatan Regency. The results of the website usability research obtained the Usefulness aspect value (System Usefulness) with an average value of 2.26, the Information Quality aspect (Information Quality) with a value of 2.03, and Interface Quality (Interface Quality) with an average value of 1.91 and Overall (Overall) with a value of 2.11 where the lower limit of the PSSUQ norm is 2.62.
Article
Full-text available
Introduction Forensic psychiatric patients receive treatment to address their violent and aggressive behavior with the aim of facilitating their safe reintegration into society. On average, these treatments are effective, but the magnitude of effect sizes tends to be small, even when considering more recent advancements in digital mental health innovations. Recent research indicates that wearable technology has positive effects on the physical and mental health of the general population, and may thus also be of use in forensic psychiatry, both for patients and staff members. Several applications and use cases of wearable technology hold promise, particularly for patients with mild intellectual disability or borderline intellectual functioning, as these devices are thought to be user-friendly and provide continuous daily feedback. Method In the current randomized crossover trial, we addressed several limitations from previous research and compared the (continuous) usability and acceptance of four selected wearable devices. Each device was worn for one week by staff members and patients, amounting to a total of four weeks. Two of the devices were general purpose fitness trackers, while the other two devices used custom made applications designed for bio-cueing and for providing insights into physiological reactivity to daily stressors and events. Results Our findings indicated significant differences in usability, acceptance and continuous use between devices. The highest usability scores were obtained for the two fitness trackers (Fitbit and Garmin) compared to the two devices employing custom made applications (Sense-IT and E4 dashboard). The results showed similar outcomes for patients and staff members. Discussion None of the devices obtained usability scores that would justify recommendation for future use considering international standards; a finding that raises concerns about the adaptation and uptake of wearable technology in the context of forensic psychiatry. We suggest that improvements in gamification and motivational aspects of wearable technology might be helpful to tackle several challenges related to wearable technology.
Article
Full-text available
This study is a part of a research effort to develop the Questionnaire for User Interface Satisfaction (QUIS). Participants, 150 PC user group members, rated familiar software products. Two pairs of software categories were compared: 1) software that was liked and disliked, and 2) a standard command line system (CLS) and a menu driven application (MDA). The reliability of the questionnaire was high, Cronbach’s alpha=.94. The overall reaction ratings yielded significantly higher ratings for liked software and MDA over disliked software and a CLS, respectively. Frequent and sophisticated PC users rated MDA more satisfying, powerful and flexible than CLS. Future applications of the QUIS on computers are discussed.
Article
Full-text available
Many usability practitioners conduct most of their usability evaluations to improve a product during its design and development. We call these "formative" evaluations to distinguish them from "summative" (validation) usability tests at the end of development." This article is online in an open-source journal, UXPA's Journal of Usability Studies https://uxpajournal.org/wp-content/uploads/sites/7/pdf/JUS_Theofanos_Nov2005.pdf
Article
Full-text available
"Really, how many users do you need to test? Three answers, all different." ---User Experience, Vol. 4, Issue 4, 2005
Conference Paper
A high-fidelity prototype of an extended voice mail application was created. We tested it using three distinct usability testing paradigms so that we could compare the quantity and quality of the information obtained using each. The three methods employed were (1) heuristic evaluation, in which usability experts critique the user interface, (2) think-aloud testing, in which naive subjects comment on the system as they use it, and (3) performance testing, in which task completion times and error rates are collected as naive subjects interact with the system. The three testing methodologies were roughly equivalent in their ability to detect a core set of usability problems on a per evaluator basis. However, the heuristic and think-aloud evaluations were generally more sensitive, uncovering a broader array of problems in the user interface. Implications of these findings are discussed in terms of the costs of doing the evaluations and in light of other work on this topic.
Book
Usability testing and user experience research typically take place in a controlled lab with small groups. While this type of testing is essential to user experience design, more companies are also looking to test large sample sizes to be able compare data according to specific user populations and see how their experiences differ across user groups. But few usability professionals have experience in setting up these studies, analyzing the data, and presenting it in effective ways. Online usability testing offers the solution by allowing testers to elicit feedback simultaneously from 1,000s of users. Beyond the Usability Lab offers tried and tested methodologies for conducting online usability studies. It gives practitioners the guidance they need to collect a wealth of data through cost-effective, efficient, and reliable practices. The reader will develop a solid understanding of the capabilities of online usability testing, when it's appropriate to use and not use, and will learn about the various types of online usability testing techniques.*The first guide for conducting large-scale user experience research using the internet *Presents how-to conduct online tests with 1000s of participants - from start to finish *Outlines essential tips for online studies to ensure cost-efficient and reliable results.
Article
Matching of usability problem descriptions consists of determining which problem descriptions are similar and which are not. In most comparisons of evaluation methods matching helps determine the overlap among methods and among evaluators. However, matching has received scant attention in usability research and may be fundamentally unreliable. We compare how 52 novice evaluators match the same set of problem descriptions from three think aloud studies. For matching the problem descriptions the evaluators use either (a) the similarity of solutions to the problems, (b) a prioritization effort for the owner of the application tested, (c) a model proposed by Lavery and colleagues [Lavery, D., Cockton, G., Atkinson, M.P., 1997. Comparison of evaluation methods using structured usability problem reports. Behaviour and Information Technology, 16 (4/5), 246–266], or (d) the User Action Framework [Andre, T.S., Hartson, H.R., Belz, S.M., McCreary, F.A., 2001. The user action framework: a reliable foundation for usability engineering support tools. International Journal of Human–Computer Studies, 54 (1), 107–136]. The resulting matches are different, both with respect to the number of problems grouped or identified as unique, and with respect to the content of the problem descriptions that were matched. Evaluators report different concerns and foci of attention when using the techniques. We illustrate how these differences among techniques might adversely influence the reliability of findings in usability research, and discuss some remedies.
Article
Article
For different levels of user performance, different types of information are processed and users will make different types of errors. Based on the error's immediate cause and the information being processed, usability problems can be classified into three categories. They are usability problems associated with skill-based, rule-based and knowledge-based levels of performance. In this paper, a user interface for a Web-based software program was evaluated with two usability evaluation methods, user testing and heuristic evaluation. The experiment discovered that the heuristic evaluation with human factor experts is more effective than user testing in identifying usability problems associated with skill-based and rule-based levels of performance. User testing is more effective than heuristic evaluation in finding usability problems associated with the knowledge-based level of performance. The practical application of this research is also discussed in the paper.