ArticlePDF Available

Introduction: Current Issues in Usability Evaluation

Taylor & Francis
International Journal of Human-Computer Interaction
Authors:
  • MeasuringU

Abstract

In this introduction to the special issue of the International Journal of Human-Computer Interaction, I discuss some current topics in usability evaluation and indicate how the contributions to the issue relate to these topics. The contributions cover a wide range of topics in usability evaluation, including a discussion of usability science, how to evaluate usability evaluation methods, the effect and control of certain biases in the selection of evaluative tasks, a lack of reliability in problem detection across evaluators, how to adjust estimates of problem-discovery rates computed from small samples, and the effects of perception of hedonic and ergonomic quality on user ratings of a product's appeal.
Introduction:
Current Issues in Usability Evaluation
James R. Lewis
IBM Corporation
InthisintroductiontothespecialissueoftheInternationalJournalofHuman–ComputerIn-
teraction, I discuss some current topics in usability evaluation and indicate how the con-
tributions to the issue relate to these topics. The contributions cover a wide range of top-
ics in usability evaluation, including a discussion of usability science, how to evaluate
usability evaluation methods, the effect and control of certain biases in the selection of
evaluative tasks, a lack of reliability in problem detection across evaluators, how to ad-
just estimates of problem-discovery rates computed from small samples, and the effects
ofperceptionofhedonic and ergonomic quality on user ratings of aproduct’sappeal.
1. INTRODUCTION
1.1. Acknowledgements
I start by thanking Gavriel Salvendy for the opportunity to edit this special issue on
usability. I have never done anything quite like this before, and it was both more dif-
ficult and more rewarding than I expected. The most rewarding aspect was working
with the contributing authors and reviewers who volunteered their time and effort
to the contents of this issue. The reviewers were Robert Mack (IBM T. J. Watson Re-
search Center), Richard Cordes (IBM Human Factors Raleigh), and Gerard Holle-
mans (Philips Research Laboratories Eindhoven). Several of the contributing au-
thors have indicated to me their appreciation for the quality of the reviews that they
received. To this I add my sincere appreciation. Providing a comprehensive, critical
review of an article is demanding work that advances the literature but provides al-
most no personal benefit to the reviewer (who typically remains anonymous).
Doing this work is truly the mark of the dedicated, selfless professional.
1.2. State of the Art in Usability Evaluation
I joined IBM as a usability practitioner in 1981 after getting a master’s degree in en-
gineering psychology. At that time, the standard practice in our product develop-
INTERNATIONAL JOURNAL OF HUMAN–COMPUTER INTERACTION,
13
(4), 343–349
Copyright © 2001, Lawrence Erlbaum Associates, Inc.
Send requests for reprints to James R. Lewis, IBM Corporation, 8051 Congress Avenue, Suite 2227,
Boca Raton, FL 33487. E-mail: jimlewis@us.ibm.com
ment laboratory was to conduct scenario-based usability studies for diagnosing
and correcting usability problems. Part of the process of setting up such a study was
for one or more of the usability professionals in the laboratory to examine the prod-
uct (its hardware, software, and documentation) to identify potential problem areas
and to develop scenarios for the usability study. Any obvious usability problems
uncovered during this initial step went to Development for correction before run-
ning the usability study.
Alphonse Chapanis had consulted with IBM on the use of this rather straightfor-
ward approach to usability evaluation. Regarding an appropriate sample size, he
recommended 6 participants because experience had shown that after watching
about 6 people perform a set of tasks, it was relatively rare to observe the occur-
rence of additional important usability problems (Chapanis, personal communica-
tion, 1979).
A clear problem with this approach was that to run the usability study you
needed a working version of the product. Since I first started as a usability
practitioner, there have been two important lines of usability research that have af-
fected the field. One is the development of additional empirical and rational usabil-
ity evaluation methods (e.g., think aloud, heuristic evaluation, cognitive walk-
through, GOMS, QUIS, SUMI), many of which were motivated by the need to start
initial usability evaluation earlier in the development cycle. The other is the com-
parative evaluation of these usability methods (their reliability, validity, and rela-
tive efficiency). The last 20 years have certainly seen the introduction of more us-
ability evaluation tools for the practitioner’s toolbox and some consensus (and still
some debate) on the conditions under which to use the various tools. We have
mathematical models for the estimation of appropriate sample sizes for prob-
lem-discovery usability evaluations (rather than a simple appeal to experience;
Lewis, 1982, 1994; Nielsen & Landauer, 1993; Virzi, 1990, 1992). Within the last 12
years usability researchers have published a variety of usability questionnaires
with documented psychometric properties (e.g., Chin, Diehl, & Norman, 1988;
Kirakowski & Corbett, 1993; Lewis, 1995, 1999).
Yet many questions remain. For example
Do we really understand the differences among the various usability evalua-
tion methods in common use by practitioners? Do we have a good idea about how
to compare usability evaluation methods?
How do the tasks selected for a scenario-based usability evaluation affect the
outcome of the evaluation? How do we decide what tasks to include in an evalua-
tion and what tasks to exclude?
How does the implicit assumption that we only ask participants to do tasks
that are possible with a system affect their performance in a usability evaluation?
• Usability evaluation based on the observation of participants completing
tasks with a system is the golden standard for usability evaluation, with other ap-
proaches generally considered discounted by one criterion or another. Usability
practitioners assume that the same usability problems uncovered by one labora-
tory would, for the most part, be discovered if evaluated in another laboratory. To
what extent do they know that this assumption is true? What are the implications
for the state of the art if the assumption is not true?
344 Lewis
Sample size estimation for usability evaluations depends on having an esti-
mate of the rate of problem discovery (p) across participants (or, in the case of heu-
ristic evaluation, evaluators). It turns out, though, that estimates of this rate based
on small-sample usability studies are necessarily inflated (Hertzum & Jacobsen,
this issue). Is there any way to adjust for this bias so practitioners can use small-
sample estimates of pto develop realistic estimates of the true problem-discovery
rate and thereby estimate accurate sample size requirements for their studies?
Is usability enough to assure the success of commercial products in a competi-
tive marketplace? To what extent is “likeability” or “appealingness” affected by or
independent of usability? How can we begin to measure this attribute?
• We feel like we knowone when we see one, butwhat is the real definition of aus-
abilityproblem?Whatistheappropriatelevelatwhichtorecordusabilityproblems?
2. CONTRIBUTIONS TO THIS ISSUE
The contributions to this issue do not answer all of these questions, but they do ad-
dress a substantial number of them.
2.1. “Usability Science. I: Foundations”
In this first article of the issue, Gillan and Bias describe the emerging discipline of
usability science. The development and qualification of methods for usability de-
sign and evaluation require a scientific approach, yet they may well be distinct from
other similar disciplines such as human factors engineering or human–computer
interaction. In their article, Gillan and Bias reach across various modern disciplines
and into the history of psychology to develop their arguments.
2.2. “Criteria for Evaluating Usability Evaluation Methods”
Hartson, Andre, and Williges (this issue) tackle the problem of how to compare us-
ability evaluation methods. The presence of their article is additional testimony to
the importance of the recent critique by Gray and Salzman (1998) on potentially
misleading research (“damaged goods”) in the current literature of studies that
compare usability evaluation methods. Developing reliable and valid comparisons
of usability evaluation methods is a far from trivial problem, and Hartson et al. lay
out a number of fundamental issues on how to measure and compare the outputs of
different types of usability evaluation methods.
2.3. “Task-Selection Bias: A Case for User-Defined Tasks”
Cordes (this issue) provides evidence that participants in laboratory-based usabil-
ity evaluations assume that the tasks that evaluators ask them to perform must be
possible and that manipulations to bring this assumption into doubt have dramatic
Current Issues in Usability Evaluation 345
effects on a study’s quantitative usability measures (a strong bias of which many us-
ability practitioners are very likely unaware). Cordes discusses how introducing
user-defined tasks into an evaluation can help control for this bias and presents
techniques for dealing with the methodological and practical consequences of in-
cluding user-defined tasks in a usability evaluation.
2.4. “Evaluator Effect: A Chilling Fact
About Usability Evaluation Methods”
This title of the article by Hertzum and Jacobsen (this issue) might seem a bit ex-
treme, but the evidence they present is indeed chilling. Their research indicates that
the most widely used usability evaluation methods suffer from a substantial evalu-
ator effect—that the set of usability problems uncovered by one observer often
bears little resemblance to the sets described by other observers evaluating the same
interface. They discuss the conditions that affect the magnitude of the evaluator ef-
fect and provide recommendations for reducing it.
For me, this is the most disturbing article in the issue, in part because other in-
vestigators have recently reported similar findings (Molich et al., 1998). The possi-
bility that usability practitioners might be engaging in self-deception regarding the
reliability of their problem-discovery methods is reminiscent of clinical psycholo-
gists who apply untested evaluative techniques (such as projective tests), continu-
ing to have faith in their methods despite experimental evidence to the contrary
(Lilienfeld, Wood, & Garb, 2000). Although we might not like the results of
Hertzum and Jacobsen (this issue), we need to understand them and their implica-
tions for how we do what we do as usability practitioners.
Often usability practitioners only have a single opportunity to evaluate an inter-
face, so there is no way to determine if their usability interventions have really im-
proved an interface. In my own experience though, when I have conducted a stan-
dard scenario-based, problem-discovery usability evaluation with one observer
watching multiple participants complete tasks with an interface and have done so
in an iterative fashion, the measurements across iterations consistently indicate a
substantial and statistically reliable improvement in usability. This leads me to be-
lieve that, despite the potential existence of a substantial evaluator effect, the appli-
cation of usability evaluation methods (at least, methods that involve the observa-
tion of participants performing tasks with a product under development) can
result in improved usability (e.g., see Lewis, 1996). An important task for future re-
search in the evaluator effect will be to reconcile this effect with the apparent reality
of usability improvement achieved through iterative application of usability evalu-
ation methods.
2.5. “Evaluation of Procedures for Adjusting Problem-
Discovery Rates Estimated From Small Samples”
For many years I have promoted the measurement of prates from usability studies
for the dual purpose of (a) projecting required sample sizes and (b) estimating the
proportion of discovered problems for a given sample size (Lewis, 1982, 1994, 2001).
346 Lewis
I have always made the implicit assumption that values of pestimated from small
samples would have properties similar to those of a mean—that the variance would
be greater than for studies with larger sample sizes, but in the long run estimates of p
would be unbiased. I was really surprised (make that appalled) when Hertzum and
Jacobsen (this issue) demonstrated in their article that estimates of pbased on small
samples are almost always inflated. The consequence of this is that practitioners
who use small-sample estimates of pto assess their progress when running a usabil-
ity study will think they are doing much better than they really are. Practitioners
who use small-sample estimates of pto project required sample sizes for a usability
study will seriously underestimate the true sample size requirement.
This spurred me to investigate whether there were any procedures that could re-
liably compensate for the small-sample inflation of p. If not, then it would be im-
portant for practitioners to become aware of this limitation and to stop using
small-sample estimates of p. If so, then it would be important for practitioners to
begin using the appropriate adjustment procedure(s) to ensure accurate assess-
ment of sample size requirements and proportions of discovered problems. Fortu-
nately, techniques based on observations by Hertzum and Jacobsen (this issue) and
a discounting method borrowed from statistical language modeling can produce
very accurate adjustments of p.
2.6. “Effect of Perceived Hedonic Quality on Product Appealingness”
Within the IBM User-Centered Design community and outside of IBM (e.g., see
Tractinsky, Katz, & Ikar, 2000), there has been a growing emphasis over the last few
years to extend user-centered design beyond traditional usability issues and to ad-
dress the total user experience. One factor that has inhibited this activity is the pau-
city of instruments for assessing nontraditional aspects of users’ emotional re-
sponses to products. Hassenzahl (this issue) has started a line of research in which
he uses semantic differentials to measure both ergonomic and hedonic quality and
relates these measurements to the appealingness of a product. Although there is still
a lot of work to do to validate these measurements, it is a promising start that should
be of interest to practitioners who have an interest in the total user experience.
3. CONCLUSIONS
I hope this special issue will be of interest to both usability scientists and practitio-
ners. The contributors are from both research (Gillan, Hartson, Andre, Williges, and
Hertzum) and applied (Bias, Cordes, Jacobsen, Lewis, and Hassenzahl) settings,
with two of the articles collaborations between the settings (Gillan & Bias, this issue;
Hertzum & Jacobsen, this issue).
The articles in this special issue address many of the topics listed in Section
1.2 (although there is still much work to do). One important outstanding issue,
though, is the development of a definition of what constitutes a real usability prob-
lem with which a broad base of usability scientists and practitioners can agree. This
is a topic that comes up in half of the articles in this issue (Hartson et al.; Hertzum &
Jacobsen; Lewis) but is one for which I have not yet seen a truly satisfactory treat-
Current Issues in Usability Evaluation 347
ment (but see the following for some current work in this area: Cockton & Lavery,
1999; Connell & Hammond, 1999; Hassenzahl, 2000; Lavery, Cockton, & Atkinson,
1997; Lee, 1998; Virzi, Sokolov, & Karis, 1996).
Despite the unanswered questions, I believe that the field of usability engineer-
ing is in much better shape than it was 20 years ago (both methodologically and
with regard to the respect of product developers for usability practitioners), and I
look forward to seeing and participating in the developments that will occur over
the next 20 years. I also look forward to seeing the effect (if any) that the articles
published in this special issue will have on the course of future usability research
and practice.
REFERENCES
Chin, J. P., Diehl, V. A., & Norman, L. K. (1988). Development of an instrument measuring
user satisfaction of the human–computer interface. In Conference Proceedings of Human Fac-
tors in Computing Systems CHI ’88 (pp. 213–218). Washington, DC: Association for Com-
puting Machinery.
Cockton, G., & Lavery, D. (1999). A framework for usability problem extraction. In Hu-
man–Computer Interaction—INTERACT ’99 (pp. 344–352). Amsterdam: IOS Press.
Connell, I. W., & Hammond, N. V. (1999). Comparing usability evaluation principles with
heuristics: Problem instances vs. problem types. In Human–Computer Interaction—INTER-
ACT ’99 (pp. 621–629). Amsterdam: IOS Press.
Gray, W. D., & Salzman, M. C. (1998). Damaged merchandise? A review of experiments that
compare usability evaluation methods. Human–Computer Interaction, 13, 203–261.
Hassenzahl, M. (2000). Prioritizing usability problems: Data-driven and judgement-driven
severity estimates. Behaviour and Information Technology, 19, 29–42.
Kirakowski, J., & Corbett, M. (1993). SUMI: The Software Usability Measurement Inventory.
British Journal of Educational Technology, 24, 210–212.
Lavery, D., Cockton, G., & Atkinson, M. P. (1997). Comparison of evaluation methods using
structured usability problem reports. Behaviour and Information Technology, 16, 246–266.
Lee, W. O. (1998). Analysis of problems found in user testing using an approximate model of
user action. In People and Computers XIII: Proceedings of HCI ’98 (pp. 23–35). Sheffield, Eng-
land: Springer-Verlag.
Lewis, J. R. (1982). Testing small-system customer set-up. In Proceedings of the Human Factors
Society 26th Annual Meeting (pp. 718–720). Santa Monica, CA: Human Factors Society.
Lewis, J. R. (1994). Sample sizes for usability studies: Additional considerations. Human Fac-
tors, 36, 368–378.
Lewis, J. R. (1995). IBM computer usability satisfaction questionnaires: Psychometric evalua-
tion and instructions for use. International Journal of Human–Computer Interaction, 7, 57–78.
Lewis, J. R. (1996). Reaping the benefits of modern usability evaluation: The Simon story. In
A. F. Ozok & G. Salvendy (Eds.), Advances in applied ergonomics: Proceedings of the 1st Inter-
national Conference on Applied Ergonomics (pp. 752–757). Istanbul, Turkey: USA Publishing.
Lewis, J. R. (1999). Trade-offs in the design of the IBM computer usability satisfaction ques-
tionnaires. In H. Bullinger & J. Ziegler (Eds.), Human–computer interaction: Ergonomics and
user interfaces—Vol. 1 (pp. 1023–1027). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Lewis, J. R. (2001). Sample size estimation and use of substitute audiences (Tech. Rep. No. 29.3385).
Raleigh, NC: IBM. Available from the author.
348 Lewis
Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2000). The scientific status of projective tech-
niques. Psychological Science in the Public Interest, 1, 27–66.
Molich, R., Bevan, N., Curson, I., Butler, S., Kindlund, E., Miller, D., & Kirakowski, J. (1998).
Comparative evaluation of usability tests. In Usability Professionals Association Annual Con-
ference Proceedings (pp. 189–200). Washington, DC: Usability Professionals Association.
Nielsen, J., & Landauer, T. K. (1993). A mathematical model of the finding of usability prob-
lems. In Conference Proceedings on Human Factors in Computing Systems—CHI ’93 (pp.
206–213). New York: Association for Computing Machinery.
Tractinsky, N., Katz, A. S., & Ikar, D. (2000). What is beautiful is usable. Interacting With Com-
puters, 13, 127–145.
Virzi, R. A. (1990). Streamlining the design process: Running fewer subjects. In Proceedings of
the Human Factors Society 34th Annual Meeting (pp. 291–294). Santa Monica, CA: Human
Factors Society.
Virzi, R. A. (1992). Refining the test phase of usability evaluation: How many subjects is
enough? Human Factors, 34, 443–451.
Virzi, R. A., Sokolov, J. L., & Karis, D. (1996). Usability problem identification using both low-
and high-fidelity prototypes. In Proceedings on Human Factors in Computing Systems CHI
’96 (pp. 236–243). New York: Association for Computing Machinery.
Current Issues in Usability Evaluation 349
... The required number of test participants in usability tests has been discussed and studied extensively (e.g. Virzi 1992; J.R. Lewis 1994Lewis , 2001aJ. Nielsen 1994a), but studies of the effect of the test tasks have been more rare. ...
... 7. Usability problems are real if they are found in evaluations. Also J.R. Lewis (2001a) raises the question whether HCI researchers and practitioners understand the differences between common usability evaluation methods, and the possibilities to compare them. Woolrych, Hornbaek, Frøkjaer and Cockton (2011) continue with these themes and make an analogy between usability evaluation methods and recipes. ...
... This way, the users would have better opportunity to assess the utility of the system in addition to its ease of use. J.R. Lewis (2001a), on his part, reminds that even the convention of giving only doable tasks in user testing may bias the results, as the users are more convinced in finding the required functionality in a test than in a real use situation. ...
Book
Full-text available
Usability testing has become a standard method when evaluating the usability of various systems with real users. Despite its general use and wide acceptance, the factors of usability testing have been given little attention in academic forums. This doctoral thesis studies methods and factors of usability testing focusing on the effects of thinking aloud and the presence of a test moderator. It combines an extensive literature review with experiences of usability testing from 22 years covering 143 usability studies. It also includes an experimental part with relaxed thinking aloud and the presence of a test moderator as independent variables.
... Usability is defined as the relationship between humans and computers [53]. Usability testing [53,56,57,60] focuses on evaluating process measures based on 4 major components: user (clinician), task (use of metanarrative), system (metanarrative in EHR), and environment (outpatient setting). This user-task-system-environment framework of process measures [61] will be incorporated into our usability testing to determine barriers and facilitators from the clinician's perspective, and to help determine the essential requirements for integrating the narrative intervention into the EHR by evaluating information flow, use of information, and system functionality. ...
Article
Full-text available
Background: In the health care setting, electronic health records (EHRs) are one of the primary modes of communication about patients, but most of this information is clinician centered. There is a need to consider the patient as a person and integrate their perspectives into their health record. Incorporating a patient's narrative into the EHR provides an opportunity to communicate patients' cultural values and beliefs to the health care team and has the potential to improve patient-clinician communication. This paper describes the protocol to evaluate the integration of an adapted person-centered narrative intervention (PCNI). This adaptation builds on our previous research centered on the implementation of PCNIs. The adaptation for this study includes an all-electronic delivery of a PCNI in an outpatient clinical setting. Objective: This research protocol aims to evaluate the feasibility, usability, and effects of the all-electronic delivery of a PCNI in an outpatient setting on patient-reported outcomes. The first objective of this study is to identify the barriers and facilitators of an internet-based-delivered PCNI from the perspectives of persons living with serious illness and their clinicians. The second objective is to conduct acceptability, usability, and intervention fidelity testing to determine the essential requirements for the EHR integration of an internet-based-delivered PCNI. The third objective is to test the feasibility of the PCNI in an outpatient clinic setting. Methods: Using a mixed method design, this single-arm intervention feasibility study was delivered over approximately 3 to 4 weeks. Patient participant recruitment was conducted via screening outpatient palliative care clinic schedules weekly for upcoming new palliative care patient visits and then emailing potential patient participants to notify them about the study. The PCNI was delivered via email and Zoom app. Patient-reported outcome measures were completed by patient participants at baseline, 24 to 48 hours after PCNI, and after the initial palliative care clinic visit, approximately 1 month after baseline. Inclusion criteria included having the capacity to give consent and having an upcoming initial outpatient palliative care clinic visit. Results: The recruitment of participants began in April 2021. A total of 189 potential patient participants were approached via email, and 20 patient participants were enrolled, with data having been collected from May 2021 to September 2022. A total of 7 clinician participants were enrolled, with a total of 3 clinician exit interviews and 1 focus group (n=5), which was conducted in October 2022. Data analysis is expected to be completed by the end of June 2023. Conclusions: The findings from this study, combined with those from other PCNI studies conducted in acute care settings, have the potential to influence clinical practices and policies and provide innovative avenues to integrate more person-centered care delivery. International registered report identifier (irrid): DERR1-10.2196/41787.
... Es así como el método Thinking load puede integrarse también en esta evaluación, aportando su experiencia de usabilidad (Lewis, 2001;Lewis et al., 1990), sin limitar la experiencia de aprendizaje o de apropiación tecnológica del usuario, evitando dificultar su respuesta corporal y expresiva, su procesamiento de la información visual, la carga cognitiva percibida desde su subjetividad, o el rendimiento Preparados los requerimientos para la grabación es aconsejable incorporar ya el conjunto definitivo de dominios y subdominios aportados por el software IMTAP (Figura 2). Su puntaje estará sujeto a la previa observación, transcripción y contraste de cada interacción (Baxter, 2007). ...
Article
ARTSEDUCA, en su n. 33 vuelve a mostrar importantes investigaciones y experiencias que refuerzan los objetivos y las metas que la Agenda 2030 de la Organización de Nacio- nes Unidas, necesita conseguir. Cada vez más, se pone en valor la necesidad de contar con el ARTE más allá de la educación, en la propia sociedad, no obstante, es fundamental que en las principales etapas educativas, la educación artística llegue de la mano de los especialistas, y se prolongue a las personas más mayores, respondiendo al significado del aprendizaje a lo largo de la vida.
... Es así como el método Thinking load puede integrarse también en esta evaluación, aportando su experiencia de usabilidad (Lewis, 2001;Lewis et al., 1990), sin limitar la experiencia de aprendizaje o de apropiación tecnológica del usuario, evitando dificultar su respuesta corporal y expresiva, su procesamiento de la información visual, la carga cognitiva percibida desde su subjetividad, o el rendimiento Preparados los requerimientos para la grabación es aconsejable incorporar ya el conjunto definitivo de dominios y subdominios aportados por el software IMTAP (Figura 2). Su puntaje estará sujeto a la previa observación, transcripción y contraste de cada interacción (Baxter, 2007). ...
Article
Actualmente el paradigma tecnosimbólico-positivista de la cuarta revolución industrial hipernormaliza nuestras diversas formas de creatividad. Su estandarización y automatización afectan a ciencia, educa- ción y salud a través del uso y la apropiación de la tecnología. La crisis sanitaria actual ha evidenciado aún más la necesidad de encontrar diseños cognitivos tecnológicos flexibles y resilientes. Gran canti- dad de literatura científica expone las intensas demandas cognitivas involucradas en el ejercicio de la música. Sus hallazgos revelan que la educación musical y la terapia musical son capaces de aportar alternativas a la psicología comportamental o al objetivismo radical, revolucionando el campo de la interacciones tecnológicas en favor de la salud y el bienestar. Teniendo en cuenta esta realidad cien- tífica, este artículo presenta una propuesta de evaluación alternativa construida a partir del método IMTAP (The individualized Music Therapy Assessmente Profile), afrontando así la actual necesidad de explorar enfoques creativos de evaluación en el diseño de la tecnología cognitiva, creativa y expresiva, proponiendo la integración proactiva de musicoterapeutas y arteterapeutas profesionales dentro de los equipos que diseñan las interacciones e interfaces tecnológicas creativas y expresivas, en la búsqueda de una posible alternativa evaluativa respetuosa con la educación, salud y el bienestar.
... Since numerous aesthetic criteria had to be tested, the authors decided to keep the number of images relatively high and to reduce the number of people rating these images. This approach is especially common in experimental studies (Lewis, 2001). Similar to usability studies, a large number of features have to be identified and evaluated by a relatively small number of 'workers' (Hwang & Salvendy, 2010). ...
Article
Full-text available
Destination pictures can be perceived very differently; thus, it is critical to understand what contributes to their aesthetic perception. This study explored the aesthetic perception of 400 destination pictures posted with the popular hashtag #beautifuldestinations on Instagram. The pictures were examined based on their perceived aesthetics of the portrayed content and the perceived influence of five selected visual elements of design (colour, light, line, angle of view, and focus). Data was collected using questionnaires (n = 200), and cluster analysis identified 12 content clusters. Results revealed that pictures showing natural elements were more aesthetically pleasing than pictures representing man-made elements, suggesting that the content of pictures subconsciously influences the perceived aesthetic perception of destination pictures. Additionally, the perceived influence of the five visual elements varies within different clusters. Theoretically, the findings contribute to existing aesthetic literature and present practical implementations for Destination Management Organisations' online marketing.
... This context includes, e.g., the target group of the digitization measure, the educational content, the IT tool as well as the perspective on the usage framework. Possible perspectives of the evaluation are, for example, the perceived value of IT support (DeLone and McLean, 1992) from the point of view of students, lecturers or study format managers, the value of IT support measured by process-oriented or financial indicators (Kaplan, Norton 1996), applicability of IT tools (Lewis 2001), informational, automational or transformational benefits (Mooney et al. 1995) or the quality of information provision in terms of user needs (Lundqvist et al. 2011). Equally, criteria such as securing student achievement, performance improvement, and other hypothesis-testing techniques could be used. ...
Article
Full-text available
Abstract In the KOSMOS-2 project, a smart learning approach in form of a modern portal for students at Universities has been developed, called “MeinKosmos”. An important question from the point of view of universities and teachers is what the quality of such new tools are or what their benefits are. This article describes the evaluation approach defined in the KOSMOS-2 project for the “MeinKosmos” learning portal, and the results of using the approach in an undergraduate degree program. The focus of the evaluation is on the learning success of the students and the quality of the meta-search.
... In usability testing, post-test questionnaires aim at capturing the perceived usability ratings from the test participants and complement other user-based evaluation approaches such as observation and measurement of user performance. To support this activity, the academic and professional communities have developed, validated, and used a variety of perceived usability questionnaires over the last two decades (Chin et al. 1988;Brooke 1996;Kirakowski et al. 1998;Lewis, 2001;Bunz 2004). Prominent examples of these questionnaires are QUIS (Chin et al. 1988), SUS (Brooke 1996), and WAMMI (Kirakowski et al. 1998) just to name a few. ...
Thesis
Our connections to the digital world are invoked by brands, but the intersection of branding and interaction design is still an under-investigated area. Particularly, current websites are designed not only to support essential user tasks, but also to communicate an institution's intended brand values and traits. What we do not yet know, however, is which design factors affect which aspect of a brand. To demystify this issue, three sub-projects were conducted. The first project developed a systematic approach for evaluating the branding effectiveness of content-intensive websites (BREW). BREW gauges users' brand perceptions on four well-known branding constructs: brand as product, brand as organization, user image, and brand as person. It also provides rich guidelines for eBranding researchers in regard to planning and executing a user study and making improvement recommendations based on the study results. The second project offered a standardized perceived usability questionnaire entitled DEEP (design-oriented evaluation of perceived web usability). DEEP captures the perceived website usability on five design-oriented dimensions: content, information architecture, navigation, layout consistency, and visual guidance. While existing questionnaires assess more holistic concepts, such as ease-of-use and learnability, DEEP can more transparently reveal where the problem actually lies. Moreover, DEEP suggests that the two most critical and reliable usability dimensions are interface consistency and visual guidance. Capitalizing on the BREW approach and the findings from DEEP, a controlled experiment (N=261) was conducted by manipulating interface consistency and visual guidance of an anonymized university website to see how these variables may affect the university's image. Unexpectedly, consistency did not significantly predict brand image, while the effect of visual guidance on brand perception showed a remarkable gender difference. When visual guidance was significantly worsened, females became much less satisfied with the university in terms of brand as product (e.g., teaching and research quality) and user image (e.g., students' characteristics). In contrast, males' perceptions of the university's brand image stayed the same in most circumstances. The reason for this gender difference was revealed through a further path analysis and a follow-up interview, which inspired new research directions to unpack even more the nexus between branding and interaction design.
Book
Full-text available
Au Maroc, et comme dans d’autres pays partout dans le monde, les jeunes adoptent rapidement les technologies émergentes et leurs fonctionnalités. En fait, selon l’agence nationale de réglementation des télécommunications, en 2014, 84% des internautes marocains sont des jeunes âgés de 15 à 19 ans. Dans ce contexte, l’objectif général de cette recherche vise une meilleure compréhension des usages des dispositifs mobiles en lien avec les apprentissages formels chez les lycéens marocains. Trois orientations principales sont examinées à savoir la typologie des usages des dispositifs mobiles, l’impact de l’usage de ces derniers sur les apprentissages formels et, enfin, les facteurs déterminants qui interviennent dans l’adoption de l’apprentissage mobile. Le modèle intégral de Desjardins (2005) régissant les interactions entre les usagers et les TIC a permis à travers, une étude quantitative, de dresser un portrait global des usages des dispositifs mobiles chez nos sujets d'étude couvrant les aspects techniques, sociaux, informationnels et épistémologiques. L’analyse mixte a conduit à mettre en lumière deux éléments d’impact des usages de ces dispositifs mobiles en lien avec les apprentissages formels à savoir leurs temps de travail personnel scolaire et leurs rendements académiques. Les analyses statistiques démontrent que l’usage de ces technologies mobiles n’imposent pas nécessairement un nouveau rapport au temps de travail personnel scolaire et qu’elles sont perceptibles dans l’apprentissage académique des lycéens marocains. Au regard des déterminants principaux qui interviennent dans le processus d’adoption de l’apprentissage mobile, le cadre théorique choisi est inspiré de la théorie unifiée de l'acceptation et de l'utilisation de la technologie (UTAUT) de Venkatesh et Davis (2003). Les résultats de l'analyse statistique descriptive montrent que l’adoption de l’apprentissage mobile auprès des lycéens marocains est influencée en premier lieu par l'utilité et la facilité d'utilisation perçues de ces dispositifs. L'influence sociale et les conditions facilitatrices interviennent dans cette adoption en deuxième lieu.
Conference Paper
Full-text available
Simon (TM-Bellsouth Corp.) is a commercially available personal communicator (PC) combining features of a PDA (personal digital assistant) with a full suite of communications features. This paper describes the involvement of human factors engineering in the development of Simon, and summarizes the various approaches to usability evaluation employed during its development. Simon has received a considerable amount of praise from the industry and won several industry awards, with recognition both for its innovative engineering and its usability.
Article
Full-text available
This study is a part of a research effort to develop the Questionnaire for User Interface Satisfaction (QUIS). Participants, 150 PC user group members, rated familiar software products. Two pairs of software categories were compared: 1) software that was liked and disliked, and 2) a standard command line system (CLS) and a menu driven application (MDA). The reliability of the questionnaire was high, Cronbach’s alpha=.94. The overall reaction ratings yielded significantly higher ratings for liked software and MDA over disliked software and a CLS, respectively. Frequent and sophisticated PC users rated MDA more satisfying, powerful and flexible than CLS. Future applications of the QUIS on computers are discussed.
Conference Paper
Full-text available
Customer Set-Up is a proven approach to reducing service costs and providing products at lower prices to customers. To ensure the effectiveness of Customer Set-Up instructions and procedures, these instructions and procedures must be studied before being shipped with their associated product. This paper will address several points to consider when planning a study of a Customer Set-Up system, such as procedure, appropriateness of subjects, number of subjects, the iterative procedure, studies vs. test, and development of test criteria.
Article
Full-text available
This study is a part of a research effort to develop the Questionnaire for User Interface Satisfaction (QUIS). Participants, 150 PC user group members, rated familiar software products. Two pairs of software categories were compared: 1) software that was liked and disliked, and 2) a standard command line system (CLS) and a menu driven application (MDA). The reliability of the questionnaire was high, Cronbach’s alpha=.94. The overall reaction ratings yielded significantly higher ratings for liked software and MDA over disliked software and a CLS, respectively. Frequent and sophisticated PC users rated MDA more satisfying, powerful and flexible than CLS. Future applications of the QUIS on computers are discussed.
Article
Full-text available
This study is a part of a research effort to develop the Questionnaire for User Interface Satisfaction (QUIS). Participants, 150 PC user group members, rated familiar software products. Two pairs of software categories were compared: 1) software that was liked and disliked, and 2) a standard command line system (CLS) and a menu driven application (MDA). The reliability of the questionnaire was high, Cronbach's alpha=.94. The overall reaction ratings yielded significantly higher ratings for liked software and MDA over disliked software and a CLS, respectively. Frequent and sophisticated PC users rated MDA more satisfying, powerful and flexible than CLS. Future applications of the QUIS on computers are discussed.
Article
Recent attention has been focused on making user interface design less costly and more easily incorporated into the product development life cycle. This paper reports an experiment conducted to determine the minimum number of subjects required for a usability test. It replicates work done by Jakob Nielsen and extends it by incorporating problem importance into the curves relating the number of subjects used in an evaluation to the number of usability problems revealed. The basic findings are that (1) with between 4 and 5 subjects 80% the usability problems are detected and (2) that additional subjects are less and less likely to reveal new information. Moreover, the correlation between expert judgements of problem importance and likelihood of discovery is significant, suggesting that the most disruptive usability problems are found with the first few subjects. Ramifications for the practice of human factors are discussed as they relate to the type of usability test cycle the practitioner is employing, and the goals of the usability test.
Article
Attention has been given to making user interface design and testing less costly so that it might be more easily incorporated into the product development life cycle. Three experiments are reported in this paper that relate the proportion of usability problems identified in an evaluation to the number of subjects participating in that study. The basic findings are that (a) 80% of the usability problems are detected with four or five subjects, (b) additional subjects are less and less likely to reveal new information, and (c) the most severe usability problems are likely to have been detected in the first few subjects. Ramifications for the practice of human factors are discussed as they relate to the type of usability test cycle employed and the goals of the usability test.
Article
Software-ergonomic system analysis often reveals numerous usability problems. Given that system design suffers from limited resources, the prioritization of usability problems seems inevitable. Surprisingly enough, prioritization is not in the focus of scientific interest. Within this paper, approaches to prioritization relying on severity estimates will be presented. Two of the approaches, namely priorities based on data about the impact of a problem (data-driven) and priorities based on judgements of interest group members (judgement-driven) will be further explored. In the data-driven approach total problem-handling time caused by a usability problem is presented as a measure of severity. The major disadvantage of the data-driven approach is its costs. A possible alternative are severity estimates based on judgements by members of involved interest groups. The first of two studies shows how to obtain judgement driven severity estimates and reveals a fundamental lack of correspondence between data-driven and judgement-driven severity estimates. The second study supports the notion that the lack of correspondence may stem from a difference between assumptions of the data-driven approach and the naive judgement model of interest group members in the judgement-driven approach. A hypothetical model for severity estimates by interest group members is presented.