Conference PaperPDF Available

Current Issues in the Determination of Usability Test Sample Size: How Many Users is Enough?

Authors:
  • MeasuringU
  • Triangle Business Architecture

Abstract

The topic of "how many users" is of great interest to usability specialists who need to balance project concerns over ROI and timelines with their own concerns about designing usable interfaces. The panel will review the current controversies over usability test sample size, test validity, and reliability.
... According to the previous research, a minimum sample size of 5 is required to uncover 80% of the usability issues [54], [55]. A sample size of 12 is required to obtain statistically significant results to cover and analyze the performance metrics and success rate [54], [55]. ...
... According to the previous research, a minimum sample size of 5 is required to uncover 80% of the usability issues [54], [55]. A sample size of 12 is required to obtain statistically significant results to cover and analyze the performance metrics and success rate [54], [55]. Recent usability studies employ between 9 to 24 number of users [11], [56], [57]. ...
Article
Full-text available
The recent advancements in the web allow users to generate multimedia content, resulting in multimedia information proliferation. Existing search engines provide access to multimedia content via a disjoint assembly of media-specific results called verticals. However, this decentralized assembly of media contents requires manual aggregation and synthesizing efforts at the user’s end, hindering the information exploration process and subsequently may cause cognitive overload, hence, demanding innovative tools to discover multimedia content. The researchers have devised numerous state-of-the-art approaches; however, analysis to confirm the efficacy has little emphasis. This study investigates users’ complex multimedia information-seeking behavior over state-of-the-art web search systems to unveil the user’s information-seeking issues. Our research employs between-subjects study and post hoc analysis strategies to analyze participants’ information-seeking characteristics. The study design adopted statistical hypothesis testing to consolidate previous user behavioral studies, confirm existing strategies, and present recommended practices for future general-purpose web search engines. The participants were assigned Google and an advanced discovery search system using the same multimedia dataset to ensure the obtained results’ credibility. The primary behavioral parameters include search efforts, multimedia content exploration, search user interface (SUI), information management and presentation, and user cognition. This study uncovers several inadequacies of the search engines in meeting users’ complex discovery needs, including 29.6% less user engagement, 43% system and searching dissatisfaction, and 32% less knowledge acquisition with 63.9% increased clicking effort on traditional search engines. The results confirmed previous user studies and suggest novel research recommendations statistically significant in multimedia information exploration-related endeavors.
... The testing on each participant individually lasted between 1-2 hours each, limiting the number of possible tests within the project time frame. At this stage of our early prototype, the number of test persons can be considered enough since already 3-5 users can determine 75-99% of usability problems [26]. ...
... Five participants, including three graduate students, a professor, and a vision teacher (VT), volunteered to test the vision therapy games. Fan et al. [42] argued that using different groups of participants can help gather usability feedback from various perspectives, and in the early stages of a prototype, results from a few, but representative, users can be important [43]. Because vision experts (here, a vision teacher) enable technology use, and students can be considered end-users, we examined the attitudes and usability from the perspectives of the two main stakeholder groups. ...
Article
Full-text available
Eye-tracking technologies (ETs) and serious games (SGs) have emerged as new methods promising better support for vision screening and training. Previous research has shown the practicality of eye-tracking technology for vision screening in health care, but there remains a need for studies showing that the effective utilization of SGs and ETs are beneficial for vision training. This study investigates the feasibility of SGs and ETs for vision training by designing, developing, and evaluating a prototype influenced by commercially available games, based on a battery of exercises previously defined by vision experts. Data were collected from five participants, including a vision teacher, through a user experience questionnaire (UEQ) following a mixed method. Data analysis of the UEQ results and interviews highlighted the current challenges and positive attitudes in using SGs and ET for vision training. In conjunction with UEQ indicators such as attractiveness and perspicuity, the stimulation of the vision training battery based on the user experience provided insights into using ETs and further developing SGs to better approach different eye movements for vision training.
... Activities for this function contain the actual testing with a larger group of participants. The number of tested users might vary to a certain degree in conjunction with the system parameters, but as a rule of thumb, the minimum sample size should be oriented in accordance to other studies (Turner, Nielsen, & Lewis, 2004). In addition, control groups should be considered in certain scenarios. ...
Thesis
Die Fragestellung dieser Arbeit ist ob derzeit angebotene e-Government Systeme von älteren Nutzern angenommen werden und wie solche aufgebaut werden müssen, damit diese Nutzergruppe solche Systeme als eine nützliche Alternative zu Behördengängen annimmt. In unserer Forschung haben wir erforscht wie solche Anwendungen, welche von der Verwaltung für die gesamte Bevölkerung angeboten werden, aufgebaut werden sollten, damit diese von der gesamten Bevölkerung erfolgreich genutzt werde können. Zur Beantwortung dieser Fragestellung wurde eine dreistufige Forschung durchgeführt, welche an das ISO 9241-210 Entwicklungsmodell angelehnt ist. Die Forschung wurde parallel in Deutschland und Ungarn in Kooperation mit dem Fraunhofer FOKUS, dem Bundesministerium des Innern, der Bundesdruckerei und der Corvinus Universität Budapest durchgeführt. In der ersten Phase wurden die Erwartungen und Vorkenntnisse der Zielgruppe erforscht um die Eckpunkte und Prämissen festlegen zu können. Diese Erkenntnisse ermöglichten in der zweiten Phase die fundierte Auswahl einer Anwendung, welche als Basis für Nutzertests genutzt werden konnte. Die Testanwendung war das AusweisApp des elektronischen Personalausweises. Bei diesen Tests wurden die Nutzerfehler erfasst und die Akzeptanz durch die ASQ Methode gemessen. Anhand der gewonnen Erkenntnisse konnte die Guideline IGUAN entwickelt werden, welche eine standardisierte Herangehensweise zur Akzeptanzsteigerung darstellt. Dieses Konzept beinhaltet neben den speziellen, an ältere Nutzer angepassten Anforderungen, einem Kriterienkatalog, sowie die Abbildung der Prozesse wodurch eine Erhöhung der Akzeptanz für Ältere ermöglicht wird. In der dritten Phase der Forschung konnte die Guideline durch eine iterative Prototypentwicklung evaluiert und geprüft werden. Wir konnten beweisen, dass Verbesserungen beim Interface e-Government Anwendungen an die alternde Gesellschaft näher bringen, die Motivation erhöhen und das Nutzerempfinden nachhaltig verbessern.
... It would also be ideal to have at least five or seven test subjects [64,65,66]. ...
... When investigating the number of subjects needed for usability tests, a Poisson probability model has been found to be a reasonable fit to extant data (Nielsen & Landauer, 1993;Virzi, 1990Virzi, , 1992. In perhaps the most contemporary review of issues relating to usability testing, Turner, Nielsen, and Lewis (2002) identified two central concerns: the reliability of traditional testing procedures, and the validity of the traditional model of problem detection. More specifically, regarding the formula used for estimating problem detection, they questioned whether the probability of a problem being detected can be modeled fairly with a unitary probability value. ...
Article
Full-text available
How can one determine efficiently if an informational website or an e-learning product is working well? Relatively small numbers of the target audience are needed to improve a product during formative evaluation and usability testing as part of product development and revision cycles. However, during summative evaluation, how many subjects are needed to determine product effectiveness? When investigating the number of subjects needed for usability tests, a Poisson probability model was found to be a reasonable fit to extant data (Nielsen & Landauer, 1993; Virzi, 1990, 1992). However, this model was chosen on the basis of the number of subjects needed to identify important usability problems with a product, not for determining its effectiveness. To determine if a Website or e-learning product is working well, we investigated the predictive validity of a discrete Bayesian decision model: the Sequential Probability Ratio Test (SPRT) --originally developed by Wald (1947). Fifty-one people representing a campus community participated in a usability test of the university library online catalog search tool, and the results were analyzed post hoc with SPRT re-enactments to simulate sequential decision making after testing each subject. Across a range of parameters, the Bayesian SPRT reached the same conclusion as reflected by the entire sample with many fewer subjects, utilizing typically small Type I and II error rates. The study provides evidence of the usefulness of the SPRT decision model in situations where determination of effectiveness is the goal (product works well or not). The SPRT maximizes efficiency by testing only as many users as necessary to reach a confident conclusion.
... However, in trying to identify all the usability issues, there are those who argue for larger sample sizes of more than eight users; however these should not be tested all at the same time (Perfetti & Landesman, 2001). There is considerable discussion over test validity and reliability; criticisms of the assumptions of small sample paradigms on methodological and empirical grounds; and important issues associated with user variability which can influence the decisions for different sample sizes (for an in-depth commentary see Turner, Nielsen & Lewis, 2002), however where the sample is largely homogenous smaller samples of between three to five users can work well to identify key issues (Lewis, 1982) although with more variance in the user group or to ensure the highest capture rate of issues larger samples are more appropriate (Woolrych & Cockton, 2001). ...
Article
Full-text available
Distributed technologies and ubiquitous computing now support users who may be detached or decoupled from traditional interactions. In order to investigate the potential usability of speech and manual input devices, an evaluation of speech input across different user groups and a usability assessment of independent-user and collaborative-user interactions was conducted. Whilst the primary focus was on a formative usability evaluation, the user group evaluation provided a formal basis to underpin the academic rigor of the exercise. The results illustrate that using a speech interface is important in understanding user acceptance of such technologies. From the usability assessment it was possible to translate interactions and make them compatible with innovative input devices. This approach to interaction is still at an early stage of development, and the potential or validity of this interfacing concept is still under evaluation; however, as a concept demonstrator, the results of these initial evaluations demonstrate the potential usability issues of both input devices as well as highlighting their suitability for advanced virtual applications.
Conference Paper
Resumen: Se observa que, en la prácticas contemporáneas y vinculados con los entornos virtuales interconectados a la Web, los docentes reformulan los modos y abordajes de sus intervenciones, sus clases y sus materiales de enseñanza. En consonancia, los procesos de aprendizaje devenidos requieren materiales digitales que expresen y afiancen estos cambios. Es así que toma relevancia el concepto de Objeto de Aprendizaje (OA). En este contexto, las instituciones que promueven experiencias educativas basadas en el uso de recursos educativos digitales, tanto internos como externos, deberían replantear los ciclos de vida de producción de los mismos. Se conformarían mecanismos para asegurar mayor vigencia e incremento de calidad global, tales como diluir límites entre productores y consumidores de recursos educativos, o sea entre autores y profesores por una parte y estudiantes por otra. En ocasiones un OA puede cumplir con estándares para el e-learning y posibilitar su interoperabilidad, pero sus autores desenfocan el objetivo final que tienen los estudiantes que van a interactuar con dicho OA, el aprender. No obstante y desde una reflexión epistemológica, los OAs pueden ser emprendidos desde diversos enfoques teóricos y metodológicos para el diseño y uso en educación, según diferentes posibilidades de relación sujeto-objeto. En esa línea, se considera que deberían diseñarse abordando conceptos y metodologías propios de la Interacción Persona-Ordenador (IPO). Siguiendo el encuadre IPO, el estudiante no está aislado realizando su tarea sino que se encuentra inmerso e interactúa en un contexto socio-cultural. Para que ésto sea posible existe un complejo proceso de desarrollo del OA en el que cada uno de estos componentes debe ser abordado con igual grado de implicación, y no caer en el error frecuente de centrarse solamente en la parte tecnológica y obviar la parte humana. Se expone en este artículo una propuesta de evaluación de calidad de un OA mediante la selección de criterios que analizan funcionalidades, estándares y aspectos pedagógicos. Esta perspectiva de evaluación integrativa incluye a expertos y docentes en una primera etapa y a los usuarios finales (estudiantes) en una segunda etapa. Se presentan además resultados de dos estudios de campo en el ámbito universitario que establecen la base experimental del estudio.
Article
Renal transplant recipients are expected to adhere to a lifelong therapeutic regimen designed to preserve long-term graft function and to reduce the risk of complications. Adherence to immunosuppression is a critical component of this regimen, but studies using electronic monitoring, the most sensitive tool currently available, have found non-adherence rates of 20-26% in adult patients, whereas a mean prevalence of 32% has been reported among adolescent renal transplant recipients. Non-adherence after renal transplantation is an important clinical problem because even comparatively low rates of non-adherence are associated with increased risks of acute rejection, graft loss, reduced quality of life, and mortality. All members of the transplant team including hospital-based and community nephrologists, surgeons, nurses and therapists, should be aware of the possibility of non-adherence and be prepared to intervene. Promoting adherence is not straightforward, because risk factors for non-adherence are multifactorial and individual to each patient. As a result, intervention is more likely to promote lasting adherence if it is long term and takes place within the context of a chronic-illness management model that integrates behavioural, psychosocial and medical aspects of care appropriate to the unique needs of the individual patient.
ResearchGate has not been able to resolve any references for this publication.