Article

Measuring Perceived Usability: SUS, UMUX, and CSUQ Ratings for Four Everyday Products

Taylor & Francis
International Journal of Human-Computer Interaction
Authors:
  • MeasuringU
To read the full-text of this research, you can request a copy directly from the author.

Abstract

This research continued previous investigation of the relationships among measures of perceived usability: the System Usability Scale (SUS), three metrics derived from the Usability Metric for User Experience (UMUX), and the Computer System Usability Questionnaire (CSUQ), this time with ratings of four everyday products (Excel, Word, Amazon, and Gmail). SUS ratings of these products were generally consistent with previous reports. Significant differences in SUS means across studies could be due to differences in frequency of use, with implications for using these data as usability benchmarks. Correspondence among the various measures of perceived usability was also consistent with previous research. Considering frequency of use, mean differences ranged from -2.0 to 1.8 (average shift in Sauro-Lewis grade range from -0.6 to 0.8). When SUS scores were above average, the range restriction of the UMUX-LITEr led to relatively large discrepancies with SUS, suggesting it might not always be better than unadjusted UMUXLITE.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... SUS dalam bentuk survey telah digunakan secara luas sebagai intrumen untuk mengukur kegunaan sistem [10]. SUS terdiri dari 10 instrumen. ...
... Saya merasa tidak ada hambatan dalam menggunakan sistem ini. 10 Saya perlu membiasakan diri terlebih dahulu sebelum menggunakan ...
... Setelah hasil dari survei didapatkan akan dilakukan pengukuran terhadap nilai SUS yang didapatkan, cara pengukurannya dalam [13] adalah tiap pertanyaan dengan nomor ganjil (1,3,5,7,9) dikurangi 1 tiap nilainya, sedangkan hasil nilai yang bernomor genap (2,4,6,8,10) didapatkan dengan dari hasil pengurangan nilai kuesioner dari angka 5. Total nilai yang diadapatkan nantinya akan dijumlahkan dan dikali dengan nilai 2.5. Hasil akhir yang didapatkan adalah nilai SUS. ...
Article
Pasca pandemi penggunaan e-learning tidak wajib dilakukan pada Universitas X, dikarenakan dilakukannya kembali kuliah secara luring, sehingga penggunaan aplikasi yang telah dibangun tidak digunakan maksimal seperti aplikasi EDMODO yang digunakan sebelumnya sebagai sistem e-learning Universitas X. Perubahan kondisi ini tidak hanya terjadi pada Universitas X tetapi juga pada tempat lain. Perubahan kondisi ini menyebabkan perlunya penelitian terkait dampak dari penggunaan pasca pandemi terhadap e-learning yang digunakan saat ini oleh mahasiswa dan dosen Universitas X. Penelitian ini dilakukan secara kuantitatif menggunakan kuesioner dengan instrumen system usability scale dengan responden civitas akademika Universitas X. Berdasarkan perhitungan yang dilakukan didapatkan nilai SUS e-learning Universitas X yaitu 68,97. Nilai tersebut menyatakan bahwa sistem yang digunakan saat ini dikategorikan cukup baik. Namun nilai SUS sistem e-learning yang digunakan saat ini lebih rendah daripada nilai yang didapatkan pada penelitian sebelumnya menggunakan EDMODO sebagai aplikasi e-learning.
... The user's perception of UML-ITS was evaluated using Computer System Usability Questionnaire (CSUQ), shown in Fig 2, which is the most widely used usability questionnaire. CSUQ was developed by James Lewis [15,16] at IBM, which is based on a 7-point scale and uses 16 statements, all positive in tone. Participants rate their agreement or disagreement with each statement, based on a range from strongly agree (1 -best experience) to strongly disagree (7 -worst experience). ...
... Only positive values of the coefficient alpha may be interpreted, and it can vary from 0 (totally untrustworthy) to 1 (absolutely reliable). Although coefficient alpha is technically a measure of internal consistency, it is the most often used approach for assessing reliability, both for overall questionnaire assessment and for any subscales supported by factor analysis [16]. Fig 3 shows the overall and subscales' scores mean Cronbach's alpha reliability. ...
... The average scores were then translated to a scale of 0 to 100, with a higher number indicating a better experience. The flowing equations proposed by[16] were used for translation purposes: ...
Article
Full-text available
The most effective tutoring method is one-on-one, face-to-face in-person human tutoring. However, due to the limited availability of human tutors, computer-based alternatives have been developed. These software based alternatives are called Intelligent Tutoring Systems (ITS) which are used to tutor students in different domains. Although ITS performance is inferior to that of human teachers, the field is growing and has recently become very popular. User interfaces play key role in usability perspective of ITS. Even though ITS research has advanced, the majority of the work has concentrated on learning sciences while mostly disregarding user interfaces. Because of this, the present ITS includes effective learning modules but a less effective interface design. Usability is one approach to gauge a software's performance, while "ease of use" is one way to assess a software's quality. This paper measures the usability effectiveness of an ITS which is designed to teach Object-Oriented (OO) analysis and design concepts using Unified Modeling Language (UML). Computer Supported Usability Questionnaire (CSUQ) survey was conducted for usability evaluation of UML-ITS. According to participants' responses to the system's usability survey, all responses lie between 1 to 3 scale points which indicate that the participants were satisfied and comfortable with most of the system's interface features.
... The SUS is widely considered "fast and dirty" (Brook, 1996) because it enables rapid measurements, is free to use, and exhibits high reliability; these characteristics have promoted the wide use of the SUS (Lewis, 2018b). It is suitable for measuring various systems and products, including specialized systems (Martins et al., 2008), everyday products (Lewis, 2019), mobile applications (Kortum and Sorber, 2015), and graphic signs (Ng et al., 2011). The SUS scores can be interpreted directly using an adjective rating scale (Bangor et al., 2008) or a curve grade scale (CGS) (Sauro and Lewis, 2014), which enables intuitive interpretation of the test results. ...
... Scenes that include all systems are difficult to render in either images or animations; therefore, to expand the range of usage scenarios in the A-SUS, we chose to show a tablet computer because it functions as both a phone and a computer and is widely used and recognizable. A tablet also fits into the usage scenario of apps and computer software, which are frequently tested using the SUS (Lewis, 2019). ...
... After data cleaning according to the method described by Sauro and Lewis (2011), we retained responses from 255 participants. After further applying the cleaning methods used for two questionnaire score differences (Lah et al., 2020;Lewis, 2019;Wang et al., 2023), we excluded seven more participants whose score differences were greater than 40 points, and ultimately retained 248 valid responses. Each participant received a payment of 3 RMB. ...
Article
This paper describes the development of a cartoon animation system usability scale (A-SUS) based on the established text-based SUS questionnaire. We propose a methodology and design a short graphic interchange format (.GIF) animation for each SUS item. Experimental evaluations confirm that the scale has satisfactory psychometric properties (e.g., structural validity, reliability factor structure, concurrent validity, and sensitivity). A second experiment is used to evaluate and compare the questionnaire experiences associated with the SUS, a pictorial SUS (P-SUS), and the developed A-SUS. The results indicate that A-SUS performs well in terms of recommendations, aesthetics, motivations, and completion time. Compared with the SUS and P-SUS, the animated version is more interesting, and the overall questionnaire experience is better.
... Perceived usability is a higher-level construct of usability (Lewis, 2019) and a crucial dimension of user experience metrics (Park et al., 2013;Tullis & Albert, 2016). Standardized questionnaires are some of the most commonly used tools for measuring perceived usability (Assila et al., 2016). ...
... Since the 1980s, various standardized questionnaires have been developed, including the Post-study System Usability Questionnaire (PSSUQ), Computer System Usability Questionnaire (CSUQ), System Usability Scale (SUS), Usability Metric for User Experience (UMUX), Questionnaire for User Interface Satisfaction (QUIS), and Software Usability Measurement Inventory (SUMI), among others. Studies conducted using these questionnaires have yielded comparable results and norms that enable direct and rapid understanding of usability test results (Lewis, 2019). The standardized questionnaires allow for certain manipulations, i.e., slight changes in phrasing can be introduced without affecting the measurement outcomes (Lewis, 2019). ...
... Studies conducted using these questionnaires have yielded comparable results and norms that enable direct and rapid understanding of usability test results (Lewis, 2019). The standardized questionnaires allow for certain manipulations, i.e., slight changes in phrasing can be introduced without affecting the measurement outcomes (Lewis, 2019). ...
Article
Full-text available
The Computer System Usability Questionnaire is one of the most popular standardized usability questionnaires used to assess the perceived usability of computer software, and it has been rigorously translated into multiple languages. This article aims to complete the cross-cultural adaptation of CSUQ in a Chinese environment and examine the psychometric characteristics. We used the forward-backward translation method to complete the initial translation, and designed the psychometric experiment according to the measurement properties required by the consensus-based standards for the selection of health measurement instruments (COSMIN). The COSMIN risk of bias checklist and updated criteria for good measurement properties were used to examine the Chinese CSUQ. The results demonstrated good performance of the Chinese CSUQ in content validity , internal structure and the remaining measurement properties. The results of the factor analysis revealed a three-factor structure, as well as any discrepancy between the Chinese CSUQ and its original version. Therefore, the Chinese CSUQ can be used confidently by Chinese usability practitioners and also serve as an effective alternative to other standardized usability scales.
... Lewis (2018b) proposed the use of between-questionnaire difference scores to detect differences in participant responses. This method is a feasible method because the usability scale is universal, and the high correlation between questionnaires has been confirmed by many studies (Lewis, 2018b;Lewis et al., 2015;Lewis, 2019;Wang et al., 2020bWang et al., , 2021. Additionally, commonly used usability scales are very short, and adding a scale to the test does not significantly increase participant effort. ...
... Additionally, commonly used usability scales are very short, and adding a scale to the test does not significantly increase participant effort. In studies that use score difference as a checking method (Lah et al., 2020;Lewis, 2018a;Lewis, 2019), the current screening criteria is very rough, and only a limited number of studies have preliminarily reported the results of a 50-point difference screening. They did not explain the reason for using 50 points as a threshold, which may difficult to achieve and the purpose of an accurate cleaning. ...
... Crowdsourcing platforms are widely used in perceived usability testing (Kortum & Bangor, 2013;Kortum & Sorber, 2015) and a large number of studies have also used anonymous online mail surveys to complete their assessments (Yang et al., 2012;Lewis, 2018aLewis, , 2019. These studies were not traditional laboratory tests but the data quality was shown to be usable. ...
Article
Full-text available
To improve the accuracy of crowdsourced data, efficient elimination of low-quality data, commonly referred to as data cleaning, is a straightforward but powerful method. In this research, the effectiveness of various data cleaning methods was tested in two measuring environments. We used the score differences of three perceived usability questionnaires (SUS, UMUX, and mATM) to provide a new basis for accurately cleaning low-quality data. We accomplished this by observing the data cleaning effect of various score difference intervals. Our study ultimately showed that (1) significant differences in scores are found in different measurement settings, (2) completion time is a useful indicator for detecting low-quality data, (3) the correlation of the questionnaire after cleaning proves that a method combined with an inspection item pairs more strictly than a method using only completion time, and (4) a score difference of 30 points on the highly correlated perceived usability scale is a suitable cleaning threshold that is feasible for shorter questionnaires. Therefore, in the case of using a single questionnaire, a cleaning method combining the completion time and inspection item pairs can be used; we also recommend simultaneously using two standardized perceived usability questionnaires, a threshold score difference of 30 points between the two questionnaires can be used as a cleaning criterion.
... UMUX-LITE and a regression adjusted SUS equivalent UMUX-LITEr was proposed as a very short indicator of usability, which consisted of the first item (item A) and the third item (item C) of UMUX . UMUX is found to be sensitive to users' amount of exposure to the product (Berkman & Karahoca, 2016;Borsci et al., 2015;Lewis, 2019), along with other usability scales, as well as being sensitive to different applications with the same purpose of use (Berkman & Karahoca, 2016;Lewis, , 2019. These studies also showed that UMUX correlates with concurrently collected CSUQ and SUS scores. ...
... UMUX-LITE and a regression adjusted SUS equivalent UMUX-LITEr was proposed as a very short indicator of usability, which consisted of the first item (item A) and the third item (item C) of UMUX . UMUX is found to be sensitive to users' amount of exposure to the product (Berkman & Karahoca, 2016;Borsci et al., 2015;Lewis, 2019), along with other usability scales, as well as being sensitive to different applications with the same purpose of use (Berkman & Karahoca, 2016;Lewis, , 2019. These studies also showed that UMUX correlates with concurrently collected CSUQ and SUS scores. ...
... The correspondence between the score magnitudes were also reported in some studies. (e.g., Lewis , 2019Wang et al., 2020), or the least significant difference (LSD) is used (Wang et al., 2022) as an indicator. Kortum & Bangor (2013) and Gao et al. (2020) offered a similar ranking order of the scores for evaluated apps, as evidence for the validity of the translations of the same scale, based on the data collected in a between-subjects design, since the differences between subjects lead to differences in score magnitudes. ...
Article
Within the last decade, the Usability Metric for User Experience is translated into many languages. This study aimed to create an adapted version of UMUX into Turkish, based on item translations made from Arabic, Chinese, English, Italian and Slovene versions, which vary for the content of the items as well as their differences in how they are processed by machine translation systems and understood by human translators. Based on different translations, 45 UMUX variants in Turkish language are assessed psychometrically as a formative construct, regarding the criteria based on PLS measurement models for formative constructs and Rasch analysis. The results are benchmarked with the data collected via UMUX, SUS and CSUQ in Arabic and English or Turkish and English concurrently. Results show that all 45 Turkish variants of UMUX reveal strong psychometric qualities as a formative construct, as well as UMUX and Arabic UMUX. While there is evidence that usability can be measured as a formative construct, differences between item sets suggest that UMUX may not be a complete measure that includes all aspects of usability regarding ISO 9241-11. Our results also suggest that UMUX scores are sensitive to the native language of participants since the mean scores were significantly different between the native and English UMUX versions which were responded concurrently to assess the same software.
... In most cases, the correspondence among questionnaires was not confirmed in the cross-cultural adaptation. Recent studies focusing on the English version of the questionnaire (e.g., Borsci et al., 2015;Lah et al., 2020;Lewis, 2018bLewis, , 2019aLewis et al., 2015) have confirmed that there is a relationship between different usability standardized questionnaires. However, other researchers have observed differences between the psychological measurement of translated questionnaire versions and English versions in terms of cross-cultural adaptation (Borsci et al., 2015;Gao et al., 2020;Gronier & Baudet, 2021). ...
... The SUS can be used to evaluate a wide range of products and systems (Kortum & Bangor, 2013) including various everyday products (Kortum & Bangor, 2013;Lewis, 2019a;Lewis et al., 2015;Zwakman et al., 2021), management systems (Orfanou et al., 2015), video service systems (Huang, 2020), mobile applications (Kortumand & Sorber, 2015), online education platforms (Pal & Vanijja, 2020), Web applications (Ghosh et al., 2018;Shorman et al., 2021), and more. Scholars have proven that SUS is the most popular tool for measuring usability. ...
... The mean score of SUS is 73.61 (SD = 12.40) and the CGS grade is B-. The mean score of UMUX is 74.58 (13.71), with a CGS grade of B. The two questionnaires are basically consistent with the scores of everyday products (Kortum & Bangor, 2013;Lewis, 2019a). According to the recommended UMUX-LITE calculating methods (Lewis, 2018b), we calculated the mean score (SD) of UMUX-LITE as 74.24 (13.85) and the CGS grade as B. ...
Article
Full-text available
In this study, we explored usability structures in the Chinese environment by (i) developing a Chinese version of the modified Technology Acceptance Model (mTAM), which uses a strict translation and prediction test process with strict psychometric evaluation characteristics and (ii) examining the relationships among Chinese versions of mTAM, SUS, and UMUX; we find that the three questionnaires have a strong correspondence. Further verification revealed that the correlation between SUS and PEU is stronger than between SUS and PU. Though the three questionnaires were developed independently in the Chinese environment, all can be used to effectively measure perceived usability.
... UMUX has indeed grown popular in the years since its creation. Scholars have tested the psychometric properties of UMUX and their relationships with other usability standardized questionnaires (Lewis, 2018a(Lewis, , 2019; they have also translated it into other languages including Arabic, Slovenian, and Italian. ...
... The UMUX, though it is significantly newer than the SUS, will be widely used in coming years (Lewis, 2019). It has garnered a great deal of research attention in terms of cross-cultural adaptation. ...
... There is demand for innovative, diversified testing tools (Finstad, 2013). A shorter usability questionnaire, as mentioned above, is advisable in cases of multi-attribute measurement (Lewis, 2019). ...
Article
Full-text available
A new Usability Metric for User Experience (UMUX) is translated and validated for native Chinese speakers in this study. The forward-backward translation method is applied to translate the UMUX. The results are optimized through structure-back interviews to obtain a final, unambiguous UMUX version. The Chinese version of the UMUX questionnaire is proven to have high reliability, sensitivity, and effectiveness. The correspondence between the Chinese UMUX version, UMUX-LITE, and SUS is also investigated. The results of this work may provide Chinese usability practitioners with a standardized scale after rigorous testing, as well as a closer understanding of the relationship between the three questionnaires in assisting users to make sound choices.
... UMUX has indeed grown popular in the years since its creation. Scholars have tested the psychometric properties of UMUX and their relationships with other usability standardized questionnaires (Lewis, 2018a(Lewis, , 2019; they have also translated it into other languages including Arabic, Slovenian, and Italian. ...
... The UMUX, though it is significantly newer than the SUS, will be widely used in coming years (Lewis, 2019). It has garnered a great deal of research attention in terms of cross-cultural adaptation. ...
... There is demand for innovative, diversified testing tools (Finstad, 2013). A shorter usability questionnaire, as mentioned above, is advisable in cases of multi-attribute measurement (Lewis, 2019). ...
Article
A new Usability Metric for User Experience (UMUX) is translated and validated for native Chinese speakers in this study. The forward-backward translation method is applied to translate the UMUX. The results are optimized through structure-back interviews to obtain a final, unambiguous UMUX version. The Chinese version of the UMUX questionnaire is proven to have high reliability, sensitivity, and effectiveness. The correspondence between the Chinese UMUX version, UMUX-LITE, and SUS is also investigated. The results of this work may provide Chinese usability practitioners with a standardized scale after rigorous testing, as well as a closer understanding of the relationship between the three questionnaires in assisting users to make sound choices.
... Efforts to replicate these findings have led to the conclusion that addressing the instrument as 2 dimensional has no practical or theoretical interest. This study, therefore, treats the SUS as a unidimensional instrument of perceived usability [32]. ...
... Regarding scaling assumptions, the dimensionality of the SUS has been evaluated previously [32]. In accordance with these studies, this study assumes that the SUS is to be treated as unidimensional (all items measure the same construct). ...
Article
Background The Swedish health care system is undergoing a transformation. eHealth technologies are increasingly being used. The System Usability Scale is a widely used tool, offering a standardized and reliable measure for assessing the usability of digital health solutions. However, despite the existence of several translations of the System Usability Scale into Swedish, none have undergone psychometric validation. This highlights the urgent need for a validated and standardized Swedish version of the System Usability Scale to ensure accurate and reliable usability evaluations. Objective The aim of the study was to translate and psychometrically evaluate a Swedish version of the System Usability Scale. Methods The study utilized a 2-phase design. The first phase translated the System Usability Scale into Swedish and the second phase tested the scale’s psychometric properties. A total of 62 participants generated a total of 82 measurements. Descriptive statistics were used to visualize participants’ characteristics. The psychometric evaluation consisted of data quality, scaling assumptions, and acceptability. Construct validity was evaluated by convergent validity, and reliability was evaluated by internal consistency. Results The Swedish version of the System Usability Scale demonstrated high conformity with the original version. The scale showed high internal consistency with a Cronbach α of .852 and corrected item-total correlations ranging from 0.454 to 0.731. The construct validity was supported by a significant positive correlation between the System Usability Scale and domain 5 of the eHealth Literacy Questionnaire ( P =.001). Conclusions The Swedish version of the System Usability Scale demonstrated satisfactory psychometric properties. It can be recommended for use in a Swedish context. The positive correlation with domain 5 of the eHealth Literacy Questionnaire further supports the construct validity of the Swedish version of the System Usability Scale, affirming its suitability for evaluating digital health solutions. Additional tests of the Swedish version of the System Usability Scale, for example, in the evaluation of more complex eHealth technology, would further validate the scale.
... The presented study employed the SUS survey for assessing the usability because it has high reliability and validity for usability assessment, which are verified by the literature, where Cronbach's alpha found frequently was larger than 0.80 and in most of the investigate papers even beyond 0.90 as in (Al-mayyan and Al-Refai, 2020; Kaya et al., 2019;Hoehle and Venkatesh, 2015;Finstad, 2010;Alhadreti, 2021;Lewis, 2019). The SUS survey contained ten statements. ...
... Usability Scale (SUS) score ratings-Source(Bangor et al., 2008) The Sauro/Lewis curved grading scale(Lewis, 2019) ...
... Le CSUQ a été appliqué à de nombreux systèmes de la vie quotidienne, dont notamment Google, Gmail, Amazon, Microsoft Word et Excel [27]. Ces évaluations ont permis de poursuivre les recherches sur la validation psychométrique du CSUQ, en comparaison avec d'autres questionnaires d'utilisabilité. ...
... Pour finir, quelques recherches ont récemment engagé une réflexion sur l'interprétation du score du CSUQ [26,27], tout comme cela a été le cas pour le SUS [5]. Ces recherches se donnent ainsi pour objectif de faire correspondre un adjectif qualificatif à un score obtenu à partir d'une échelle, afin que ce score puisse facilement attribuer une qualité à un système (bon, très bon, mauvais, horrible, etc.). ...
... Metrik yang digunakan untuk pengukuran adalah SUS [18] dan UEQ-S [19]. Salah satu kegunaan utama SUS adalah untuk memutuskan apakah suatu produk dapat diterima atau [20]. Meskipun SUS memiliki skor dalam rentang 0 hingga 100, bukan berarti skor tinggi mewakili produk teratas. ...
... Ruang lain untuk diperbaiki adalah jumlah responden. Dibandingkan dengan karya lain seperti yang dijelaskan dalam [20], karya ini perlu menambah jumlah responden. ...
Article
Full-text available
Public display memiliki potensi untuk menjadi media diseminasi informasi kepada publik. Sayangnya, mayoritas public display memiliki kecenderungan untuk diabaikan oleh orang-orang yang melewatinya. Beberapa riset menunjukkan bahwa pengguna akan melihat sebuah public display jika dapat berinteraksi dengan display tersebut. Salah satu pendekatannya adalah Screen-Smart Device Interaction (SSI). Pada penelitian ini, dikembangkan sebuah interaksi untuk menampilkan sebuah file dari smart device ke sebuah display menggunakan antar muka berbasis Chat. Pengujian dilakukan dengan alat ukur SUS dan UEQ-S. Dari pengujian diperoleh kesimpulan bahwa hasil yang dikembangkan termasuk kategori acceptable dengan skor SUS 72.8. Akan tetapi dari UEQ-S diperoleh skor hedonic yang lebih rendah dari pragmatic, yang artinya perlu pengembangan lebih lanjut terkait tampilan dari prototype yang diujikan.
... Among the reasons why CSUQ was chosen as the evaluation instrument was that the Computer Usability Scale (SUS) is not validated in Spanish, being the most widely used in usability studies in mobile health. Another reason for its use is the strength of the CSUQ for measuring usability in field studies [23]. ...
... What do these domains have in common? From a theoretical point of view, it has been suggested that single-item scales yield valid measures of a construct whenever the to-be-measured construct and its attributes can be regarded to concern entities that are concrete and singular in the sense that (a) all raters understand which entity is being rated and (b) what is being rated is reasonably homogeneous (Rossiter 2002). in the case of perceived usability, it seems plausible that condition (a) is usually met. the same conclusion seems to hold for condition (b) given that the system Usability scale can be treated as essentially unidimensional (Berkman and Karahoca 2016;Gräve and Buchner 2024;Kortum, acemyan, and Oswald 2021;lewis 2019;lewis and sauro 2017;sauro 2018). in essence, then, the present results nicely fit the theoretical framework suggested by Rossiter (2002). ...
Article
Single-item scales of perceived usability are attractive due to their efficiency and non-verbal scales are attractive because they enable collecting data from individuals irrespective of their language proficiency. We tested experimentally whether single-item verbal and pictorial scales can compete with their 10-item counterparts at reflecting the difference in usability between well-designed and poorly designed systems. N = 1079 (Experiment 1) and N = 1092 (Experiment 2) participants worked with two systems whose usability was experimentally manipulated. Perceived usability was assessed using the 10-item System Usability Scale, the single-item Adjective Rating Scale, the 10-item Pictorial System Usability Scale and the Pictorial Single-Item Usability Scale. The single-item scales reflect the difference in usability as good as their 10-item counterparts. The pictorial scales are nearly as valid as their verbal counterparts. The single-item Adjective Rating Scale and the Pictorial Single-Item Usability Scale are thus efficient and valid alternatives to their 10-item counterparts.
... The SUS questionnaire was based on Brooke [6], modified to suit the context of the study, and comprised 10 questions. The SUS, with its simple administration, is suitable for small sample sizes, yielding reliable results, and has undergone validation to distinguish between usable and unusable systems [23]. To counteract potential response bias and acquiescent bias among respondents, five questions were phrased in a negative manner. ...
Article
Full-text available
This study presents a comprehensive evaluation of a virtual reality-based testing station designed for flexible manufacturing systems. Given the intricate nature of flexible manufacturing systems and the demand for precision in learning, the integration of virtual reality emerges as a promising approach to enhance both student competence and engagement. By employing a combined assessment with the System Usability Scale and heuristic evaluation conducted by 36 students and 5 experts, respectively, the virtual reality-based testing station achieved an average usability score of 72.78, indicating good usability. Noteworthy heuristic challenges, particularly in the domains of ‘Realistic Feedback’ and ‘Navigation and Orientation Support,’ have been identified, providing valuable insights for potential refinements to the testing station. The outcomes of this study not only guide immediate improvements but also pave the way for future research endeavors aimed at elevating the learning outcomes in flexible manufacturing systems courses.
... Note that although the presented measures differ in the result value ranges, they can all be mapped to the same range for comparisons. Moreover, several studies confirmed that they are correlated highly with one another -see, e.g., [9] and works cited therein. ...
... A parallel analysis (Horn, 1965) using principlecomponent extraction and retaining all factors corresponding to eigenvalues greater than the 95th percentile of the reference eigenvalues (Auerswald & Moshagen, 2019) consistently revealed one significant dimension for both the SUS (Experiment 1: eigenvalue = 6.37, explaining 64% of the variance; Experiment 2: eigenvalue = 7.19, explaining 72% of the variance) and the UMUX (Experiment 1: eigenvalue = 2.49, explaining 62% of the variance; Experiment 2: eigenvalue = 2.84, explaining 71% of the variance). We conclude that both the SUS and the UMUX can be treated as essentially unidimensional which is consistent with recent conclusions by others (Berkman & Karahoca, 2016;Kortum et al., 2021;Lewis, 2019;Lewis & Sauro, 2017). We consider this to be an indicator of the homogeneity of the construct of perceived usability in the sense of Rossiter (2002). ...
Article
Full-text available
Objective In usability studies, the subjective component of usability, perceived usability, is often of interest besides the objective usability components, efficiency and effectiveness. Perceived usability is typically investigated using questionnaires. Our goal was to assess experimentally which of four perceived-usability questionnaires differing in length best reflects the difference in perceived usability between systems. Background Conventional measurement wisdom strongly favors multi-item questionnaires, as measures based on more items supposedly yield better results. However, this assumption is controversial. Single-item questionnaires also have distinct advantages and it has been shown repeatedly that single-item measures can be viable alternatives to multi-item measures. Method N = 1089 (Experiment 1) and N = 1095 (Experiment 2) participants rated the perceived usability of a good or a poor web-based mobile phone contract system using the 35-item ISONORM 9241/10 (Experiment 1 only), the 10-item System Usability Scale (SUS), the 4-item Usability Metric for User Experience (UMUX), and the single-item Adjective Rating Scale. Results The Adjective Rating Scale represented the perceived-usability difference between both systems at least as good as, or significantly better than, the multi-item questionnaires (significantly better than the UMUX and the ISONORM 9241/10 in Experiment 1, significantly better than the SUS in Experiment 2). Conclusion The single-item Adjective Rating Scale is a viable alternative to multi-item perceived-usability questionnaires. Application Extremely short instruments can be recommended to measure perceived usability, at least for simple user interfaces that can be considered concrete-singular in the sense that raters understand which entity is being rated and what is being rated is reasonably homogenous.
... (CSUQ) (Lewis, 2018) , Questionnaire for User Interface Satisfaction (QUIS) (Chin et al., 1988) , Usefulness, Satisfaction, and Ease of use (USE) (Lund, 2001) , and Perceived Usefulness and Ease of Use (PUEU) (Davis, 1989) . The questionnaire is shown in Fig. 5 left part. ...
Article
Full-text available
Classroom monitoring using information communications technology (ICT) plays a significant role in enhancing teaching-learning in a blended learning environment. Learning analytics (LA) is such a popular classroom monitoring tool. LA helps teachers to the collection, interpretation, and analysis of students performance data generated during teaching and learning process. However, designing interactive LA is challenging for real-time blended classroom use. We observed significant flaws in handling large classroom monitoring challenges in a state-of-the-art system. To address those flaws and challenges, we propose the “Manas Chakshu” - a real-time ICT-based interactive LA for large blended classrooms. The core idea of the visual LA system consists of two interactive levels. The overview and the overview+detail levels are used to optimize the screen-area utilization of the available display. The system computes classroom status with the help of novel weighted states and Euclidian distance for highlighting critical classroom regions. We compare the theoretical performance of our aid with the state-of-the-art system. We found that Manas Chakshu performed better in 89.12% of the cases based on theoretical performance analysis. We also implemented Manas Chakshu as an Android application and conducted an empirical study with 39 teachers. Study results show that our system reduces average classroom monitoring time by 27.96% compared to the state-of-the-art system. We found the perceived usability in terms of teachers’ satisfaction, efficiency, and learnability ratings of the proposed system were high. The average mean ratings on a five-point Likert scale are 4.06, 3.86, and 4.02. The results show the perceived usabilities and high system usability scale (SUS) scores (average 76.86) of the proposed system were highly acceptable to the teachers.
... To illustrate, let's consider the item 3 in PSSUQ Version 3 states, "I was able to complete the tasks and scenarios quickly using this system," while Item 3 in CSUQ Version 3 states, "I am able to complete my work quickly using this system." Figure 1 displays the current version of the CSUQ, which comprises 16 items (Lewis, 2019). The measurement of a system's overall usefulness is one of the purposes of usability. ...
Article
Full-text available
This quantitative study assessed user satisfaction and usability of a university portal, specifically the Online Portal Empowered Netizens (OPEN), utilizing the Computer System Usability Questionnaire (CSUQ). The study involved distributing a modified version of the CSUQ to 185 college students from Mindanao State University Lanao del Norte Agricultural College. The questionnaire measured four categories: Student portal Usefulness (SYSUSE), Information Quality (INFOQUAL), Interface Quality (INTERQUAL), and Overall Usability (OVERALL). Data was collected through an online questionnaire, and the responses were processed using Microsoft Office Excel. The study found that the student portal performed well in terms of usefulness, information quality, interface quality, and overall usability, positively impacting students' academic experience. However, it is important to note that the findings are limited to the specific university and may not be generalizable to other institutions. The study recommends continuous monitoring and updating of the portal to ensure its relevance and user-friendliness, as well as seeking regular user feedback to identify areas for improvement. Overall, this study provides valuable insights into the usability of a university portal and offers recommendations for enhancing the user experience.
... Continuing the investigation into the relationships among various measures of perceived usability (Lah et al., 2020;Lewis, 2018aLewis, , 2019b, the major goals of the current paper were to replicate and extend the Lah et al. models with a new dataset that had some variation in the independent (drivers) and dependent (outcome) variables. ...
Article
We replicated and extended previous research to investigate the extent to which perceived ease of use and perceived usefulness account for variation in overall experience, likelihood to recommend, intention to use, and reported usage in a three-month follow-up. Consistent with previous research, we found little effect on structural equation models from varying three measures of perceived ease and two measures of perceived usefulness. All models had statistically significant standardized estimates and squared multiple correlations and had acceptable fit statistics. Despite these manipulations, the models supported a consistent narrative. Both perceived ease and perceived usefulness are important antecedents that either directly or indirectly affect the experiential and intentional outcomes (perceived usefulness somewhat more than perceived ease), with intention to use accounting for 19% of variation in follow-up ratings of usage. These models support UX practitioners by demonstrating the importance of work that improves perceptions of product ease and usefulness and showing that the two-item UX-Lite questionnaire is an effective and efficient measure of perceived ease and usefulness.
... System Usability Scale Curved Grading Scale[25] ...
Article
Full-text available
This study aims to improve a job search web application that not only addressed usability problems but also surpassed user expectations. A recently released job search web application was found to have usability problems during interviews with the Information Technology division. To measure the usability of the application and provide recommendations for improvement, the study uses the Goal-Directed Design framework, Performance Measurement method, and System Usability Scale measurement method. The evaluation was conducted twice, with the first assessment identifying problems and the second evaluation measuring the effectiveness of the recommendations made. The website prototype was developed and passed all test scenarios with an A+ grade. The modifications make important level of effectiveness, achieving an A+ grade with an 85% effectiveness rate. Furthermore, the website received exceptional user satisfaction, with an A+ rating and a score of 85.5 in usability satisfaction.
... UMUX-LITE is a short questionnaire that rates the perceived usefulness (PU) and the perceived ease of use (PEU) of a system on a seven-point Likert scale and then combines them to a score ranging from 0 to 100. Table 1 shows the UMUX-LITE scores and the corresponding school grades on the Sauro/Lewis curved grading scale (CGS) [50]. ...
Article
Full-text available
Manual repair tasks in the industry of maintenance, repair, and overhaul require experience and object-specific information. Today, many of these repair tasks are still performed and documented with inefficient paper documents. Cognitive assistance systems have the potential to reduce costs, errors, and mental workload by providing all required information digitally. In this case study, we present an assistance system for object-specific repair tasks for turbine blades. The assistance system provides digital work instructions and uses augmented reality to display spatial information. In a user study with ten experienced metalworkers performing a familiar repair task, we compare time to task completion, subjective workload, and system usability of the new assistance system to their established paper-based workflow. All participants stated that they preferred the assistance system over the paper documents. The results of the study show that the manual repair task can be completed 21% faster and with a 26% lower perceived workload using the assistance system.
... The sum of all System Usability Scale item scores was multiplied by 2.5, yielding a score ranging between 0 and 100, with higher scores the better. A usability score > 68 is considered above average, meaning that the user agrees the exergaming device is easy to use and that he or she feels confident using the exergaming device (Lewis, 2019). ...
Article
Impaired upper extremity (UE) function has limited activities of daily living in people with systemic sclerosis (SSc). Exergaming, a combination of gaming and exercises, could be a novel way to improve UE exercise engagement. The objective of this study was to examine the usability of exergaming and to investigate participant experiences after exergaming among people with SSc. Both quantitative and qualitative data were collected. Participants completed questionnaires regarding the usability of exergaming. Semi-structured interviews were conducted directly after exergaming. Descriptive statistics and thematic content analysis were performed. Twenty participants with SSc participated. Exergaming was highly acceptable with a good System Usability Scale score (M = 71.6 ± 9.9). Participants described exergaming as motivating with potential physical and nonphysical benefits. Although results were generally positive, participants expressed some barriers and temporary side effects of using exergaming and needs for improvement. This work stands to inform future exergaming interventions in people with SSc.
... When employees are included in a test group, it increases their selfefficacy and allows feedback for the usability of the program. As usability is evaluated, it minimizes risk and improves quality (Deraman & Salman, 2019;Lewis, 2019). Research has shown that professional development impacts self-efficacy, such as teachers teaching science, technology, engineering, math, and computing (Gardner et al., 2019;Rich et al., 2017). ...
... UMUX-LITE is a short questionnaire that rates the perceived usefulness (PU) and the perceived ease of use (PEU) of a system on a seven-point Likert scale and then combines them to a score ranging from 0 to 100. Table I shows the UMUX-LITE scores and the corresponding school grades on the Sauro/Lewis curved grading scale (CGS) [50]. ...
Preprint
Full-text available
Manual repair tasks in the industry of maintenance, repair, and overhaul require experience and object-specific information. Today, many of these repair tasks are still performed and documented with inefficient paper documents. Cognitive assistance systems have the potential to reduce costs, errors, and mental workload by providing all required information digitally. In this case study, we present an assistance system for object-specific repair tasks for turbine blades. The assistance system provides digital work instructions and uses augmented reality to display spatial information. In a user study with ten experienced metalworkers performing a familiar repair task, we compare time to task completion, subjective workload, and system usability of the new assistance system to their established paper-based workflow. All participants stated that they preferred the assistance system over the paper documents. The results of the study show that the manual repair task can be completed 21 % faster and with a 26 % lower perceived workload using the assistance system.
... The perceived high degree of control and low effort spent in the interaction Evaluation Scale for Perceived Usability (e.g., SUS, UMUX, CSUQ) [16] Aesthetic Appeal (AE) ...
Chapter
Full-text available
The COVID-19 pandemic has and will continue to have an unprecedented impact on museums and exhibition galleries worldwide, with online visitors to museums and exhibitions increasing significantly. The most common method used by web user experience researchers to study user engagement is questionnaires, usually conducted after the user has completed the website experience and relying on the user’s memory and lingering feelings. Therefore, the purpose of this paper is to propose a new method of assessment based on a combination of user electroencephalography (EEG) signals and a self-assessment questionnaire (UES-SF). Since EEG signal measurement is a practical method to detect sequential changes in brain activity without significant time delays, it can comprehend visitors’ unconscious and sensory responses to online exhibitions. This paper employed the Google Arts & Culture (GA&C) website as an example to study 4 different exhibition formats and their impact on user engagement. The questionnaire results showed that the “game interaction” was significantly higher (p < 0.05) in terms of participation than the “2D information Kiosks” and “3D virtual exhibitions” and was the marginally significant (0.05 < p < 0.10) than “video explanation”. However, when we combined the EEG data, we could determine that “game interaction” had the highest user engagement, followed by “video explanation”, “3D virtual exhibition”, and the “2D information kiosk”. Therefore, our new evaluation approach can assist online exhibition user experience researchers in understanding the impact of different forms of interaction on engagement more comprehensively.
... Comparing the obtained results to those reported in the literature, edCrumble, a visual authoring tool for blended learning, was evaluated using UMUX by a group of 56 users on average at 5.208 (UMUX.Q1 at 5.428, UMUX.Q2* at 5.534, UMUX.Q3 at 5.527, and UMUX.Q4* at 4.341) [45] (calculated based on Table V therein), i.e., below AuthorKit on average and in all considered aspects but the ease of use. Taking a wider perspective, Lewis reported usability evaluation results for four widely used software products: Excel (UMUX average of 5.176, n = 390), Word (5.53, n = 453), Amazon (6.088, n = 338), Gmail (5.68, n = 256) [46] (the percentages reported in Table 8 therein were remapped to the 1-7 scale). In this context, AuthorKit's usability (on average, evaluated at 5.42) could be ranked between Excel's and Word's. ...
Article
Full-text available
E-learning tools are gaining increasing relevance as facilitators in the task of learning how to program. This is mainly a result of the pandemic situation and consequent lockdown in several countries, which forced distance learning. Instant and relevant feedback to students, particularly if coupled with gamification, plays a pivotal role in this process and has already been demonstrated as an effective solution in this regard. However, teachers still struggle with the lack of tools that can adequately support the creation and management of online gamified programming courses. Until now, there was no software platform that would be simultaneously open-source and general-purpose (i.e., not integrated with a specific course on a specific programming language) while featuring a meaningful selection of gamification components. Such a solution has been developed as a part of the Framework for Gamified Programming Education (FGPE) project. In this paper, we present its two front-end components: FGPE AuthorKit and FGPE PLE, explain how they can be used by teachers to prepare and manage gamified programming courses, and report the results of the usability evaluation by the teachers using the platform in their classes.
... At long last, we applied a survey to medical staff so as to assess the ease of use of MAF. The survey utilized in our investigation was adjusted from the CSUQ (Computer System Usability Questionnaire) [23]. This method empowered us to survey and validate our framework. ...
... Since its introduction in 1986, the ten-item SUS has been assumed to be unidimensional [51]. Because a distinction based on item tone is of little practical or theoretical interest, it is also recommended that user experience practitioners and researchers generally treat the SUS as a unidimensional measure of perceived usability [52]. Meanwhile, it must be pointed out that the SUS has ten five-point items with alternating positive and negative tones, which can reduce to a certain extent the selection bias caused by the framing effect [53] to users. ...
Article
Since the early 2000s, information systems have been widely employed across hospitals in China, changing the way in which the processes are managed, improving customer satisfaction and strengthening business competence. Intelligent guidance systems for patients (IGSP), which resemble humanoid characteristics using artificial intelligence, assist patients in wayfinding, obtaining medical guidance, consultations, and other medical services, and can improve user experiences before, during, and after hospital visits. However, despite their widespread adoption, usability studies on such systems are scarce. To date, there is no practical or standardized measurement for system usability, leading to difficult inspection, maintenance, and servicing processes. In this article, we aim to determine the usability deficiency of IGSP and understand how various factors influence user satisfaction during their use. We employ the requirements set out in the ISO9241-11:2018 standard using two inspection methods with three experts and 346 valid end-users. First, a heuristic evaluation method is employed to detect usability problems and to demonstrate the violations of Nielsen's ten heuristic principles. Second, a system usability scale is applied to evaluate participants’ satisfaction toward IGSP. Finally, the analysis of variance tests and multiple linear regression analyses is performed to establish the correlations between the user satisfaction and characteristics. The results show that a total of 78 problems violated the heuristic principles 169 times. These are divided into five categories: voice interaction, in-hospital navigation, medical consultation, interactive interface design, and miscellaneous. This article contributes to the existing literature on new technologies in healthcare organizations, demonstrating that IGSP can improve customer satisfaction during hospital visits.
... SUS is one of the most popular questionnaires used for the purpose of measuring usability in a graphical user interface (GUI) environment. Extant research has shown that SUS has an excellent reliability (typically the alpha coefficient exceeds 0.90), validity, and sensitive to a wide variety of independent variables [12,13]. Thus, the efficacy of SUS as a usability measuring tool is well established. ...
Article
Full-text available
Currently, the use of voice-assistants has been on the rise, but a user-centric usability evaluation of these devices is a must for ensuring their success. System Usability Scale (SUS) is one such popular usability instrument in a Graphical User Interface (GUI) scenario. However, there are certain fundamental differences between GUI and voice-based systems, which makes it uncertain regarding the suitability of SUS in a voice scenario. The present work has a twofold objective: to check the suitability of SUS for usability evaluation of voice-assistants and developing a subjective scale in line with SUS that considers the unique aspects of voice-based communication. We call this scale as the Voice Usability Scale (VUS). For fulfilling the objectives, a subjective test is conducted with 62 participants. An Exploratory Factor Analysis suggests that SUS has a number of drawbacks for measuring the voice usability. Moreover, in case of VUS, the most optimal factor structure identifies three main components: usability, affective, and recognizability and visibility. The current findings should provide an initial starting point to form a useful theoretical and practical basis for subjective usability assessment of voice-based systems.
... A higher score is an indication of better usability. Till date numerous studies have employed SUS as a tool for usability evaluation and it is regarded highly by usability experts, specifically in a GUI context [5,[18][19]. ...
... Usability evaluation studies are mainly focused on measuring the effectiveness and efficiency of interactive products considering task-related metrics (e.g., time on task, task success rate etc.) as the pivotal point of analysis (Hornbaek & Law, 2007;Sauro & Lewis, 2016). System Usability Scale (SUS) (Brooke, 1996) and User Engagement Scale (UES) (O'Brien & Toms, 2010) are instruments that can efficiently measure the perceived usability of a system (Lewis, 2019) but they cannot directly capture qualitative aspects of interaction, such as emotions. ...
Article
HCI researchers and practitioners are increasingly using physiological data to measure User eXperience (UX) parameters. The dynamic nature of physiological data offers a continuous window for an in-depth understanding of users’ interaction experience. However, in order to be truly informative, physiological signals need to be linked to users’ interaction experience aspects, such as their emotional states, in a systematic and efficient way. Studies have shown that skin conductance is a physiological signal highly associated with stress. The main purpose of this paper is to present the validation study of our proposed stress detection mechanism which is integrated into a software named PhysiOBS. PhysiOBS is an observation analysis tool that can be used in the post-study analysis phase. PhysiOBS uses nonspecific skin conductance responses (NS-SCRs) in order to auto-report time periods that are probably associated with a problematic interaction. PhysiOBS can also combine multiple data sources. Hence, UX evaluators are able to further investigate a recorded session in order to reveal additional interaction flaws. The integrated stress assessment mechanism, which uses four trained classifiers, can be applied in the reported periods (auto/expert-reported) in order to classify them as stress or non-stress. For the purpose of the validation study, 24 users were recruited in order to participate in a lab experiment. Results showed that our stress assessment mechanism supports UX evaluators by accurately identifying stressful regions within an interaction scenario.
... Factor analysis (unrestricted least squares) of this study's CSUQ ratings proved consistent with prior studies (Berkman & Karahoca, 2016;Lewis, 2019Lewis, , 1995Lewis, , 2002. Items 1-6 were aligned on the factor one, 7-12 on factor two, and 13-15 on factor three. Figure 3 displays the response of 187 faculty members to each item in SUS questionnaire as average response. ...
Article
Educational institutions currently favor the adoption and use of modern information and communication technology for teaching and learning. As a result, many learning management systems have been developed over recent years and established in education systems. This paper reports the results of a study that examines the usability level of the Blackboard system from the perspectives of the academic members of Umm Al-Qura University in Saudi Arabia by using two of the most commonly used measures of perceived usability, namely, SUS and CSUQ. It also examines the association between these measures, and the effect of faculty demographics attributes on their scores. The results of the study revealed that the usability of Blackboard at the current institution is inadequate and needs to be further enhanced. It also showed that SUS and CSUQ questionnaires correlated highly and largely appear to be measuring the same thing, presumably, perceived usability. The results also indicated the significant effect of Blackboard frequency of use on both SUS and CSUQ scores.
... Our study provides only limited information with respect to the effect of guidelines for optimizing documentation on perceived usability. Future research should use more sophisticated instruments to evaluate perceived usability, e.g. by using standardized questionnaires such as the System Usability Scale (SUS) or the Computer System Usability Questionnaire (CSUQ), which both have been found to be informative for a wide range of products [38]. ...
Conference Paper
Full-text available
The growing importance of APIs creates a need to support developers with effective documentation. Prior research has generated important findings regarding information needs of developers and expectations they form towards API documentation. Several guidelines have been proposed on the basis of these findings, but evidence is lacking whether such guidelines actually lead to better documentation. This paper contributes the results of an empirical test that compared the performance of two groups of developers working on a set of pre-defined tasks with an API they were unfamiliar with. One group had access to documentation which was optimized following guidelines for API documentation design proposed in the literature whereas the other group used non-optimized documentation. Results show that developers working with optimized documentation made fewer errors on the test tasks and were faster in planning and executing the tasks. We conclude that the guidelines used in our study do have the intended effect and effectively support initial interactions with an API.
Article
Full-text available
Aplikasi mobile yang user-friendly menjadi elemen kunci dalam menjaga kepuasan pengguna di era digital. MyTelkomsel, salah satu aplikasi seluler terkemuka di Indonesia, menghadapi tantangan dalam memastikan kegunaannya seiring dengan pengembangan fitur yang kompleks. Penelitian ini bertujuan untuk mengevaluasi kegunaan aplikasi MyTelkomsel menggunakan metode System Usability Scale (SUS), yang merupakan pendekatan kuantitatif untuk menilai efisiensi, efektivitas, dan kepuasan pengguna. Sebanyak 56 responden, terdiri dari pengguna baru dan lama, berpartisipasi dalam penelitian ini. Hasilnya menunjukkan skor rata-rata SUS sebesar 72, yang mengindikasikan tingkat kegunaan yang baik, meskipun terdapat area spesifik yang memerlukan perbaikan, seperti konsistensi fungsional dan kemudahan penggunaan. Pengguna baru menunjukkan kepuasan yang lebih tinggi dengan skor rata-rata 75 dibandingkan pengguna lama dengan skor rata-rata 70. Penelitian ini memberikan wawasan untuk pengembangan lebih lanjut dalam meningkatkan pengalaman pengguna melalui perbaikan desain aplikasi.
Article
Background Alzheimer disease (AD) is the leading cause of dementia worldwide. With aging populations and limited access to effective treatments, there is an urgent need for innovative markers to support timely preventive interventions. Emerging evidence highlights spatial cognition (SC) as a valuable source of cognitive markers for AD. This study presents NavegApp, a serious game (SG) designed to assess 3 key components of SC, which show potential as cognitive markers for the early detection of AD. Objective This study aimed to determine the content validity and usability perception of NavegApp across multiple groups of interest. Methods A multistep process integrating methodologies from software engineering, psychometrics, and health measurement was implemented to validate the software. Our approach was structured into 3 stages, guided by the software life cycle for health and the Consensus-Based Standards for the Selection of Health Status Measurement Instruments (COSMIN) recommendations for evaluating the psychometric quality of health instruments. To assess content validity, a panel of 8 experts evaluated the relevance and representativeness of tasks included in the app. In addition, 212 participants, categorized into 5 groups based on their clinical status and risk level for AD, were recruited to evaluate the app’s digital ergonomics and usability at various stages of development. Complementary analyses were performed to identify group differences and to explore the association between task difficulty and user agreeableness. Results NavegApp was validated as a highly usable tool by both experts and users. The expert panel confirmed that the tasks included in the game were representative (Aiken V=0.96-1.00) and relevant (Aiken V=0.96-1.00) for measuring SC components. Both experts and nonexperts rated NavegApp’s digital ergonomics positively, with minimal differences between groups (rrb 0.08-0.29). Differences in usability perceptions were observed among participants with sporadic mild cognitive impairment compared to cognitively healthy individuals (rrb 0.26-0.29). A moderate association was also identified between task difficulty and user agreeableness (Cramér V=0.37, 95% CI 0.28-0.54). Conclusions NavegApp is a valid and user-friendly SG designed for SC assessment, developed by integrating software engineering and psychometric evaluation methodologies. While the results are promising, further studies are warranted to evaluate its diagnostic accuracy and construct validity. This work outlines a comprehensive framework for SG development in cognitive assessment, emphasizing the importance of incorporating psychometric validity measures from the outset of the design process.
Article
Objective People with systemic sclerosis (SSc or scleroderma), a rare chronic autoimmune disease, often face significant physical and emotional challenges. Peer mentoring, where someone with similar lived experiences offers guidance and support, shows promise in enhancing the well-being of recipients and may benefit individuals with systemic sclerosis. This study aims to evaluate the feasibility and potential health effects of peer mentoring through a digital platform for people with systemic sclerosis. Methods We conducted a one-group study to evaluate a 16-week peer mentoring program for people with systemic sclerosis. Mentors and mentees were matched by demographics and systemic sclerosis characteristics. Feasibility was evaluated using Orsmond and Cohn criteria: recruitment, data collection, acceptability, available resources, and participant responses to the program. Perceptions and usability of the peer mentoring program through a digital platform were assessed at week 16 (post-program). The health effects of peer mentoring were measured at baseline, week 8, and week 16. Results Five trained mentors and 15 mentees were enrolled. Each mentor was paired with 2–4 mentees. We found that peer mentoring through a digital platform was feasible, acceptable, and had good usability for both mentors and mentees. Mentees reported significantly less anxiety at week 16 ( p < 0.001). Other improvements in fatigue, pain interference, depressed mood, and resilience were observed, but did not reach statistical significance. Conclusion The peer mentoring program through a digital platform was well-received. Results provided preliminary support for the feasibility and potential health benefits of peer mentoring to enhance well-being in people with systemic sclerosis. Findings lay the groundwork for future peer mentoring research in systemic sclerosis.
Article
Full-text available
p>Many solutions were proposed to help people get their food or recipe ingredients easily without having to go to the market in person. Especially for people who have limited time and energy to go out and buy groceries. However, the existing solutions still lack the intuitiveness for users to get the ingredient information of a food recipe and still can’t facilitate the customers to have an all-in-one solution to get the needed ingredients. This study aims to combine the machine learning modalities and image recognition approach to provide a mobile app that can help people find food recipes intuitively and provide a digital ecosystem that can accommodate collaboration between various business actors including the MSMEs (Micro, Small, and Medium Enterprises). Instead of users buying items separately, the unique value of the proposed app (BahanbaKu) is that each of the ingredients needed by the user will be delivered in a single package. The preliminary evaluation result reveals that the accuracy of the proposed app is promising to assist people while getting recipe information for a particular food. The proposed app is also considered intuitive to the users with 81.7 SUS score.</p
Article
Purpose: Wheelchair users experience many barriers to physical activity as affordable and accessible exercise equipment options are limited. Thus, the home-based adapted rower (aROW) and gym-based aROW were developed. The objectives were to determine: 1) wheelchair users' preferences, perspectives, facilitators, and barriers to using the home-based versus the gym-based aROW, 2) perceived usability of the home and gym aROWs, and 3) recommendations to adapt the aROW further for home and community use. Materials and methods: In this two-phase exploratory mixed-methods study, participants completed one month of using a home aROW, followed by one month of using a community gym aROW. After each phase, participants completed a semi-structured interview and the System Usability Scale (SUS) questionnaire. Interview data were analyzed using conventional content analysis and effect size comparing SUS data was calculated. Results and conclusions: Four categories were identified: what worked well, barriers to using the aROWs, what could be improved and important considerations. There was a large effect size in perceived usability between the aROWs with participants preferring the home aROW. Overall, rowing was enjoyable, and participants achieved positive physical outcomes. As preferences are individual, the home aROW provides wheelchair users with a potential choice between home or gym exercise.
Chapter
Full-text available
Usability and user experience (UX) are important concepts in the design and evaluation of products or systems intended for human use. This chapter introduces the fundamentals of design for usability and UX, focusing on the application of science, art, and craft to their principled design. It reviews the major methods of usability assessment, focusing on usability testing. The concept of UX casts a broad net over all of the experiential aspects of use, primarily subjective experience. User-centered design and design thinking are methods used to produce initial designs, after which they typically use iteration for design improvement. Service design is a relatively new area of design for usability and UX practitioners. The fundamental goal of usability testing is to help developers produce more usable products. The primary activity in diagnostic problem discovery tests is the discovery, prioritization, and resolution of usability problems.
Article
Nowadays, Senile Dementia is one of the most recurrent ailments related to aging as brain functions begin to deteriorate, making the elderly more dependent on others to take care of them. Using ecosystems with mixed reality allows them to have an easier way to their activities, have some independence, improve their quality of life and do exercise routines by themselves with the help of the Internet for remote control monitoring. This work proposes an architectural model for a mixed reality ecosystem to support older adults’ daily activities. The work advocates the design of the ecosystem components, which are used in two scenarios for the rehabilitation of the visuo-constructive ability of patients, making a more adequate and detailed combination and implementation of connectivity, software and peripherals.
Article
Purpose High performance computing (HPC) is used to solve complex calculations that personal computing devices are unable to handle. HPC offers the potential for small- and medium-size enterprises (SMEs) to engage in product innovation, service improvement and the optimization of resource allocation (Borstnar and Ilijas, 2019). However, the expensive infrastructure, maintenance costs and resource knowledge gaps that accompany the use of HPC can make it inaccessible to SMEs. By moving HPC to the cloud, SMEs can gain access to the infrastructure without the requirement of owning or maintaining it, but they will need to accept the terms and conditions of the cloud contract. This paper aims to improve how SMEs access HPC through the cloud by providing insights into the terms and conditions of HPC cloud contracts. Design/methodology/approach This paper adopts a systematic literature review by implementing a four-step approach. A comprehensive search was undertaken and results synthesized to enable this paper’s objectives to be met. Findings This paper proposes that SMEs could gain competitive advantage(s) by understanding their own needs and improving their contract negotiation abilities, service management skills and risk management abilities before accepting the terms and conditions of the cloud contract. Furthermore, a checklist, service-level agreement, easily ignored elements and risk areas are presented as guidance for SMEs when reviewing their HPC cloud contract(s). Originality/value While HPC cloud contracts are a niche research topic, it is one of the key factors influencing the ability of SMEs to access HPC through the cloud. It is, however, by no means a level playfield with SMEs at a distinct disadvantage because of not influencing the writing up of the HPC cloud contract. The added value of the paper is that it contributes to our overall understanding of the terms and conditions of HPC cloud contracts.
Article
Usability and cognitive workload (CWL) are multidimensional constructs that describe user experience, predict performance, and inform system design. The relationship between the subjective measures of these constructs has not been adequately explored, especially in healthcare delivery settings where suboptimal usability of electronic health records and CWL of healthcare professionals are among the major contributing factors to medical errors. This study quantifies the perceived usability of a dosimetry quality assurance (QA) checklist and the perceived CWL of dosimetrists in radiation oncology clinical settings of an academic medical center and investigates the association between perceived usability and perceived CWL. Findings suggest that our institutional dosimetry QA checklist has suboptimal usability, but the associated CWL is acceptable. Further, the correlation analysis reveals that perceived usability and perceived CWL are non-overlapping constructs and may be jointly employed to reduce the risk of healthcare professionals committing medical errors.
Article
Full-text available
The available continuous measurement designs for longitudinal research on perceived usability include immediate and retrospective measurement methods. The present study was conducted to examine the gap between different continuous measurement designs. Thirty users completed a longitudinal study on typesetting software. We used two measurement methods to design three stages of testing. Each stage consists of one immediate measurement and two retrospective measurements. A long-term retrospective test was also conducted and SUS questionnaire measurements were gathered. We observed significant differences in perceived usability between participants with varying quantities of event experiences (i.e. the number of times they used the system). Within short periods of time (3 days), there may be significant differences between retrospective and immediate assessments—especially when users make negative evaluations. In the 3-day period, the user’s perceived usability retrospective evaluation is also based on the most recent experience as opposed to the most unpleasant. The recency effect is lost, however, over a long-term retrospective assessment (1 month later). This study is significant for two main reasons: (i) its unique methodology and (ii) its unprecedented level of consistency between measured results and actual user experiences. We hope that the results presented here provide a workable basis for continuous measurement design in future longitudinal studies. We found that in order to secure accurate measurements, it is necessary to set limits on the measurement time and the number of uses (events).
Article
Full-text available
Objetivo: Testar a efetividade e usabilidade do aplicativo TRAT-C 2019 na orientação do melhor esquema terapêutico para portador de hepatite C. Métodos: Os testes foram realizados com médicos residentes no ambulatório de hepatites virais do Hospital Geral de Fortaleza. A amostra foi composta por 81 portadores crônicos de hepatite C. Resultados: Houve prevalência do genótipo 1 (79%) e presença de fibrose significativa ou avançada em 77,7% dos casos. Obteve-se resposta virológica sustentada (RVS) em 98,1% dos pacientes tratados. A redução do tempo para determinação do esquema terapêutico adequado obtida com o app foi de 60,6% quando comparada à consulta ao material tradicional. A avaliação de usabilidade pela System Usability Scale obteve um escore médio de 89,92. Conclusão: O TRAT-C 2019 é de fácil utilização e efetivo na definição do esquema terapêutico para hepatite C, como foi comprovado pela taxa de RVS obtida seguindo-se o tratamento indicado pelo aplicativo. ABSTRACT Objective: To test the effectiveness and usability of the TRAT-C 2019 application in guiding the best therapeutic scheme for patients with hepatitis C. Methods: The tests were carried out with doctors residing in the outpatient clinic for viral hepatitis at the Hospital Geral de Fortaleza. The sample consisted of 81 chronic hepatitis C patients. Results: There was a prevalence of genotype 1 (79%) and the presence of significant or advanced fibrosis in 77.7% of cases. Sustained virological response (SVR) was present in 98.1% of treated patients. The time reduction for determining the appropriate therapeutic regimen obtained with the app was 60.6% when compared to the consultation with traditional material. The usability evaluation by the System Usability Scale obtained an average score of 89.92. Conclusion: TRAT-C 2019 is easy to use and effective in defining the therapeutic scheme for hepatitis C, as evidenced by the SVR rate obtained following the treatment indicated by the application. RESUMEN Objetivo: Probar la efectividad y usabilidad de la aplicación TRAT-C 2019 para orientar el mejor esquema terapéutico para pacientes con hepatitis C. Métodos: Las pruebas se realizaron con médicos residentes en la consulta externa de hepatitis viral del Hospital Geral de Fortaleza. La muestra estuvo constituida por 81 pacientes con hepatitis crónica C. Resultados: Hubo una prevalencia del genotipo 1 (79%) y la presencia de fibrosis significativa o avanzada en el 77,7% de los casos. Se obtuvo una respuesta virológica sostenida (RVS) en el 98,1% de los pacientes tratados. La reducción de tiempo para determinar la pauta terapéutica adecuada obtenida con la aplicación fue del 60,6% respecto a la consulta con material tradicional. La evaluación de usabilidad por la Escala de usabilidad del sistema obtuvo una puntuación media de 89,92. Conclusión: TRAT-C 2019 es fácil de usar y eficaz en la definición del esquema terapéutico para la hepatitis C, como lo demuestra la tasa de RVS obtenida tras el tratamiento indicado por la aplicación.
Article
Purpose Arm crank ergometry and adaptive rowing are existing exercise options for wheelchairs users, but not commonly available. This study was conducted to explore exercise participation of wheelchair users, as well as the usability of the adaptive rowing ergometer (aROW) and arm crank ergometer (ACE). Methods This mixed-methods study used a concurrent triangulation design. Following completion of both exercise sessions (5 min each), participants (n = 14) with spinal cord injury/disease (SCI/D) completed the System Usability Scale (SUS), and a semi-structured interview. Participants were asked about the use of both exercise modalities, and general exercise participation. SUS data were analyzed using a paired sample t-test and qualitative data were analyzed through conventional content analysis. Results Wheelchair users exercised for improved physical and mental health, as well as for functional independence, and community participation; however, lack of accessible equipment was a prominent barrier. Both the aROW and ACE have high usability, but the aROW was perceived as more enjoyable and effective for cardiovascular exercise. Conclusions The implementation of the aROW into community gyms has the potential to help close the existing gap in inclusive equipment and may help people with disabilities to be more fully included in their community and lead healthier lives. • Implications for rehabilitation • Wheelchair users perceive exercise as a meaningful activity that enhances physical health and risk of disease, functional independence, community participation, and overall social and emotional health. • The adapted rowing machine was perceived as highly usable and was felt to be more enjoyable and effective for cardiovascular exercise compared to traditional arm crank ergometers. • The adaptive rower provides an additional accessible equipment option for wheelchair users to obtain effective cardiovascular exercise. • More available equipment may increase community participation and promote inclusion for wheelchair users.
Article
Full-text available
The System Usability Scale (SUS) is the most widely used standardized questionnaire for the assessment of perceived usability. This review of the SUS covers its early history from inception in the 1980s through recent research and its future prospects. From relatively inauspicious beginnings, when its originator described it as a “quick and dirty usability scale,” it has proven to be quick but not “dirty.” It is likely that the SUS will continue to be a popular measurement of perceived usability for the foreseeable future. When researchers and practitioners need a measure of perceived usability, they should strongly consider using the SUS.
Article
Full-text available
The primary purpose of this research was to investigate the relationship between two widely used questionnaires designed to measure perceived usability: the Computer System Usability Questionnaire (CSUQ) and the System Usability Scale (SUS). The correlation between concurrently collected CSUQ and SUS scores was 0.76 (over 50% shared variance). After converting CSUQ scores to a 0–100-point scale (to match the range of the SUS scores), there was a small but statistically significant difference between CSUQ and SUS means. Although this difference (just under 2 scale points out of a possible 100) was statistically significant, it did not appear to be practically significant. Although usability practitioners should be cautious pending additional independent replication, it appears that CSUQ scores, after conversion to a 0–100-point scale, can be interpreted with the Sauro–Lewis curved grading scale. As a secondary research goal, investigation of variations of the Usability Metric for User Experience (UMUX) replicated previous findings that the regression-adjusted version of the UMUX-LITE (UMUX-LITEr) had the closest correspondence with concurrently collected SUS scores. Thus, even though these three standardized questionnaires were independently developed and have different item content and formats, they largely appear to be measuring the same thing, presumably, perceived usability.
Article
Full-text available
In 2009, we published a paper in which we showed how three independent sources of data indicated that, rather than being a unidimensional measure of perceived usability, the System Usability Scale apparently had two factors: Usability (all items except 4 and 10) and Learnability (Items 4 and 10). In that paper, we called for other researchers to report attempts to replicate that finding. The published research since 2009 has consistently failed to replicate that factor structure. In this paper, we report an analysis of over 9,000 completed SUS questionnaires that shows that the SUS is indeed bidimensional, but not in any interesting or useful way. A comparison of the fit of three confirmatory factor analyses showed that a model in which the SUS's positive-tone (odd-numbered) and negative-tone (even-numbered) were aligned with two factors had a better fit than a unidimensional model (all items on one factor) or the Usability/Learnability model we published in 2009. Because a distinction based on item tone is of little practical or theoretical interest, we recommend that user experience practitioners and researchers treat the SUS as a unidimensional measure of perceived usability, and no longer routinely compute Usability and Learnability subscales.
Chapter
Full-text available
Covers the basics of usability testing plus some statistical topics (sample size estimation, confidence intervals, and standardized usability questionnaires).
Article
Full-text available
Usability Metric for User Experience (UMUX) and its shorter form variant UMUX-LITE are recent additions to standardized usability questionnaires. UMUX aims to measure perceived usability by employing fewer items that are in closer conformance with the ISO 9241 definition of usability, while UMUX-LITE conforms to the technology acceptance model (TAM). UMUX has been criticized regarding its reliability, validity, and sensitivity, but these criticisms are mostly based on reported findings associated with the data collected by the developer of the questionnaire. Our study re-evaluates the UMUX and UMUX-LITE scales using psychometric methods with data sets acquired through two usability evaluation studies: an online word processor evaluation survey (n = 405) and a web-based mind map software evaluation survey for three applications (n = 151). Data sets yielded similar results for indicators of reliability. Both UMUX and UMUX-LITE items were sensitive to the software when the scores for the evaluated software were not very close, but we could not detect a significant difference between the software when the scores were closer. UMUX and UMUX-LITE items were also sensitive to users’ level of experience with the software evaluated in this study. Neither of the scales was found to be sensitive to the participants’ age, gender, or whether they were native English speakers. The scales significantly correlated with the System Usability Scale (SUS) and the Computer System Usability Questionnaire (CSUQ), indicating their concurrent validity. The parallel analysis of principal components of UMUX pointed out a single latent variable, which was confirmed through a factor analysis, that showed the data fits better to a single-dimension factor structure.
Article
Full-text available
Article
Full-text available
The use of applications on mobile devices has reached historic levels. Using the System Usability Scale (SUS), data were collected on the usability of applications used on two kinds of mobile platforms—phones and tablets—across two general classes of operating systems, iOS and Android. Over 4 experiments, 3,575 users rated the usability of 10 applications that had been selected based on their popularity, as well as 5 additional applications that users had identified as using frequently. The average SUS rating for the top 10 apps across all platforms was 77.7, with a nearly 20-point spread (67.7–87.4) between the highest and lowest rated apps. Overall, applications on phone platforms were judged to be more usable than applications on the tablet platforms. Practitioners can use the information in this article to make better design decisions and benchmark their progress against a known universe of apps for their specific mobile platform.
Article
Full-text available
This article describes the psychometric properties of the Emotional Metric Outcomes (EMO) questionnaire and the System Usability Scale (SUS) using data collected as part of a large-sample unmoderated usability study (n = 471). The EMO is a concise multifactor standardized questionnaire that provides an assessment of transaction-driven personal and relationship emotional outcomes, both positive and negative. The SUS is a well-known standardized usability questionnaire designed to assess perceived usability. In previous research, psychometric evaluation using data from a series of online surveys showed that the EMO and its component scales had high reliability and concurrent validity with loyalty and overall experience metrics but did not find the expected four-factor structure. Previous structural analyses of the SUS have had mixed results. Analysis of the EMO data from the usability study revealed the expected four-factor structure. The factor structure of the SUS appeared to be driven by item tone. The estimated reliability of the SUS (.90) was consistent with previous estimates. The EMO and its subscales were also quite reliable, with the estimates of reliability for the various EMO scales ranging from.86 to.96. Regression analysis using SUS, EMO, and Effort as predictors revealed different key drivers for the outcome metrics of Satisfaction and Likelihood-to-Recommend. The key recommendations are to include the EMO as part of the battery of poststudy standardized questionnaires, along with the SUS (or similar questionnaire), but to be cautious in reporting SUS subscales such as Usable and Learnable.
Article
Full-text available
The purpose of this research was to investigate various measurements of perceived usability, in particular, to assess (a) whether a regression formula developed previously to bring Usability Metric for User Experience LITE (UMUX-LITE) scores into correspondence with System Usability Scale (SUS) scores would continue to do so accurately with an independent set of data; (b) whether additional items covering concepts such as findability, reliability, responsiveness, perceived use by others, effectiveness, and visual appeal would be redundant with the construct of perceived usability or would align with other potential constructs; and (c) the dimensionality of the SUS as a function of self-reported frequency of use and expertise. Given the broad use of and emerging interpretative norms for the SUS, it was encouraging that the regression equation for the UMUX-LITE worked well with this independent set of data, although there is still a need to investigate its efficacy with a broader set of products and methods. Results from a series of principal components analyses indicated that most of the additional concepts, such as findability, familiarity, efficiency, control, and visual appeal covered the same statistical ground as the other more standard metrics for perceived usability. Two of the other items (Reliable and Responsive) made up a reliable construct named System Quality. None of the structural analyses of the SUS as a function of frequency of use or self-reported expertise produced the expected components, indicating the need for additional research in this area and a need to be cautious when using the Usable and Learnable components described in previous research.
Article
Full-text available
Nowadays, practitioners extensively apply quick and reliable scales of user satisfaction as part of their user experience (UX) analyses to obtain well-founded measures of user satisfaction within time and budget constraints. However, in the human-computer interaction (HCI) literature the relationship between the outcomes of standardized satisfaction scales and the amount of product usage has been only marginally explored. The few studies that have investigated this relationship have typically shown that users who have interacted more with a product have higher satisfaction. The purpose of this paper was to systematically analyze the variation in outcomes of three standardized user satisfaction scales (SUS, UMUX and UMUX-LITE) when completed by users who had spent different amounts of time with a website. In two studies, the amount of interaction was manipulated to assess its effect on user satisfaction. Measurements of the three scales were strongly correlated and their outcomes were significantly affected by the amount of interaction time. Notably, the SUS acted as a unidimensional scale when administered to people who had less product experience, but was bidimensional when administered to users with more experience. We replicated previous findings of similar magnitudes for the SUS and UMUX-LITE (after adjustment), but did not observe the previously reported similarities of magnitude for the SUS and the UMUX. Our results strongly encourage further research to analyze the relationships of the three scales with levels of product exposure. We also provide recommendations for practitioners and researchers in the use of the questionnaires.
Conference Paper
Full-text available
Over the recent years, the notion of a non-instrumental, hedonic quality of interactive products received growing interest. Based on a review of 151 publications, we summarize more than ten years research on the hedonic to provide an overview of definitions, assessment tools, antecedents, consequences, and correlates. We highlight a number of contributions, such as introducing experiential value to the practice of technology design and a better prediction of overall quality judgments and product acceptance. In addition, we suggest a number of areas for future research, such as providing richer, more nuanced models and tools for quantitative and qualitative analysis, more research on the consequences of using hedonic products and a better understanding of when the hedonic plays a role and when not.
Conference Paper
Full-text available
Over the recent years, the notion of a non-instrumental, hedonic quality of interactive products received growing interest. Based on a review of 151 publications, we summarize more than ten years research on the hedonic to provide an overview of definitions, assessment tools, antecedents, consequences, and correlates. We highlight a number of contributions, such as introducing experiential value to the practice of technology design and a better prediction of overall quality judgments and product acceptance. In addition, we suggest a number of areas for future research, such as providing richer, more nuanced models and tools for quantitative and qualitative analysis, more research on the consequences of using hedonic products and a better understanding of when the hedonic plays a role and when not.
Article
Full-text available
The philosopher of science J. W. Grove (1989) once wrote, “There is, of course, nothing strange or scandalous about divisions of opinion among scientists. This is a condition for scientific progress” (p. 133). Over the past 30 years, usability, both as a practice and as an emerging science, has had its share of controversies. It has inherited some from its early roots in experimental psychology, measurement, and statistics. Others have emerged as the field of usability has matured and extended into user-centered design and user experience. In many ways, a field of inquiry is shaped by its controversies. This article reviews some of the persistent controversies in the field of usability, starting with their history, then assessing their current status from the perspective of a pragmatic practitioner. Put another way: Over the past three decades, what are some of the key lessons we have learned, and what remains to be learned? Some of the key lessons learned are:• When discussing usability, it is important to distinguish between the goals and practices of summative and formative usability.• There is compelling rational and empirical support for the practice of iterative formative usability testing—it appears to be effective in improving both objective and perceived usability.• When conducting usability studies, practitioners should use one of the currently available standardized usability questionnaires.• Because “magic number” rules of thumb for sample size requirements for usability tests are optimal only under very specific conditions, practitioners should use the tools that are available to guide sample size estimation rather than relying on “magic numbers.”
Conference Paper
Full-text available
In this paper we present the UMUX-LITE, a two-item questionnaire based on the Usability Metric for User Experience (UMUX) [6]. The UMUX-LITE items are This system's capabilities meet my requirements and This system is easy to use." Data from two independent surveys demonstrated adequate psychometric quality of the questionnaire. Estimates of reliability were .82 and .83 -- excellent for a two-item instrument. Concurrent validity was also high, with significant correlation with the SUS (.81, .81) and with likelihood-to-recommend (LTR) scores (.74, .73). The scores were sensitive to respondents' frequency-of-use. UMUX-LITE score means were slightly lower than those for the SUS, but easily adjusted using linear regression to match the SUS scores. Due to its parsimony (two items), reliability, validity, structural basis (usefulness and usability) and, after applying the corrective regression formula, its correspondence to SUS scores, the UMUX-LITE appears to be a promising alternative to the SUS when it is not desirable to use a 10-item instrument.
Article
Full-text available
In this paper we describe the development of a standardized computer satisfaction usability questionnaire for use with speakers of the Turkish language, the Turkish Computer System Usability Questionnaire, Short Version (T-CSUQ-SV). This new questionnaire, based on the English-language CSUQ, underwent careful translation and transformation through comprehensive psychometric evaluation. The results of the psychometric evaluation revealed an acceptable level of reliability, appropriate construct validity, and sensitivity to manipulation, indicating that Turkish usability practitioners should be able to use the T-CSUQ-SV with confidence when conducting user research.
Article
Full-text available
This study is a part of a research effort to develop the Questionnaire for User Interface Satisfaction (QUIS). Participants, 150 PC user group members, rated familiar software products. Two pairs of software categories were compared: 1) software that was liked and disliked, and 2) a standard command line system (CLS) and a menu driven application (MDA). The reliability of the questionnaire was high, Cronbach's alpha=.94. The overall reaction ratings yielded significantly higher ratings for liked software and MDA over disliked software and a CLS, respectively. Frequent and sophisticated PC users rated MDA more satisfying, powerful and flexible than CLS. Future applications of the QUIS on computers are discussed.
Conference Paper
Full-text available
Usability evaluators used an 18-item, post-study questionnaire in three related usability tests. I conducted an exploratory factor analysis to investigate statistical justification to combine items into subscales. The factor analysis indicated that three factors accounted for 87 percent of the total variance. Coefficient alpha analyses showed that the reliability of the overall summative scale was .97, and ranged from .91 to .96 for the three subscales. In the sensitivity analyses, the overall scale and all three subscales detected significant differences among the user groups; and one subscale indicated a significant system effect. Correlation analyses support the validity of the scales. The overall scale correlated highly with the sum of the After-Scenario Questionnaire ratings that participants gave after each scenario. The overall scale also correlated moderately with the percentage of successful scenario completion. These results are consistent with the hypothesis that these alternative measurements tap into a common underlying construct. This construct is probably usability, based on the content of the questionnaire items and the measurement context.
Article
Full-text available
The article addresses some concerns about how coefficient alpha is reported and used. It also shows that alpha is not a measure of homogeneity or unidimensionality. This fact and the finding that test length is related to reliability may cause significant misinterpretations of measures when alpha is used as evidence that a measure is unidimensional. For multidimensional measures, use of alpha as the basis for corrections for attenuation causes overestimates of true correlation. Satisfactory levels of alpha depend on test use and interpretation. Even relatively low (e.g., .50) levels of criterion reliability do not seriously attenuate validity coefficients. When reporting intercorrelations among measures that should be discriminable, it is important to present observed correlations, appropriate measures of reliability, and correlations corrected for unreliability.
Conference Paper
Full-text available
Correlations between prototypical usability metrics from 90 distinct usability tests were strong when measured at the task-level (r between .44 and .60). Using test-level satisfaction ratings instead of task-level ratings attenuated the correlations (r between .16 and .24). The method of aggregating data from a usability test had a significant effect on the magnitude of the resulting correlations. The results of principal components and factor analyses on the prototypical usability metrics provided evidence for an underlying construct of general usability with objective and subjective factors. Author Keywords
Conference Paper
Full-text available
When designing questionnaires there is a tradition of including items with both positive and negative wording to minimize acquiescence and extreme response biases. Two disadvantages of this approach are respondents accidentally agreeing with negative items (mistakes) and researchers forgetting to reverse the scales (miscoding). The original System Usability Scale (SUS) and an all positively worded version were administered in two experiments (n=161 and n=213) across eleven websites. There was no evidence for differences in the response biases between the different versions. A review of 27 SUS datasets found 3 (11%) were miscoded by researchers and 21 out of 158 questionnaires (13%) contained mistakes from users. We found no evidence that the purported advantages of including negative and positive items in usability questionnaires outweigh the disadvantages of mistakes and miscoding. It is recommended that researchers using the standard SUS verify the proper coding of scores and include procedural steps to ensure error-free completion of the SUS by users. Researchers can use the all positive version with confidence because respondents are less likely to make mistakes when responding, researchers are less likely to make errors in coding, and the scores will be similar to the standard SUS.
Conference Paper
Full-text available
Since its introduction in 1986, the 10-item System Usability Scale (SUS) has been assumed to be unidimensional. Factor analysis of two independent SUS data sets reveals that the SUS actually has two factors - Usability (8 items) and Learnability (2 items). These new scales have reasonable reliability (coefficient alpha of .91 and .70, respectively). They correlate highly with the overall SUS ( r = .985 and .784, respectively) and correlate significantly with one another ( r = .664), but at a low enough level to use as separate scales. A sensitivity analysis using data from 19 tests had a significant Test by Scale interaction, providing additional evidence of the differential utility of the new scales. Practitioners can continue to use the current SUS as is, but, at no extra cost, can also take advantage of these new scales to extract additional information from their SUS data.
Article
Full-text available
Factor analysis of Post Study System Usability Questionnaire (PSSUQ) data from 5 years of usability studies (with a heavy emphasis on speech dictation systems) indicated a 3-factor structure consistent with that initially described 10 years ago: factors for System Usefulness, Information Quality, and Interface Quality. Estimated reliabilities (ranging from .83-.96) were also consistent with earlier estimates. Analyses of variance indicated that variables such as the study, developer, stage of development, type of product, and type of evaluation significantly affected PSSUQ scores. Other variables, such as gender and completeness of responses to the questionnaire, did not. Norms derived from this data correlated strongly with norms derived from the original PSSUQ data. The similarity of psychometric properties between the original and this PSSUQ data, despite the passage of time and differences in the types of systems studied, provide evidence of significant generalizability for the questionnaire, supporting its use by practitioners for measuring participant satisfaction with the usability of tested systems.
Article
Full-text available
The Usability Metric for User Experience (UMUX) is a four-item Likert scale used for the subjective assessment of an application’s perceived usability. It is designed to provide results similar to those obtained with the 10-item System Usability Scale, and is organized around the ISO 9241–11 definition of usability. A pilot version was assembled from candidate items, which was then tested alongside the System Usability Scale during usability testing. It was shown that the two scales correlate well, are reliable, and both align on one underlying usability factor. In addition, the Usability Metric for User Experience is compact enough to serve as a usability module in a broader user experience metric.
Article
Full-text available
This paper describes recent research in subjective usability measurement at IBM. The focus of the research was the application of psychometric methods to the development and evaluation of questionnaires that measure user satisfaction with system usability. The primary goals of this paper are to (1) discuss the psychometric characteristics of four IBM questionnaires that measure user satisfaction with computer system usability, and (2) provide the questionnaires, with administration and scoring instructions. Usability practitioners can use these questionnaires with confidence to help them measure users' satisfaction with the usability of computer systems.
Article
Full-text available
Valid measurement scales for predicting user acceptance of computers are in short supply. Most subjective measures used in practice are unvalidated, and their relationship to system usage is unknown. The present research develops and validates new scales for two specific variables, perceived usefulness and perceived ease of use, which are hypothesized to be fundamental determinants of user acceptance. Definitions for these two variables were used to develop scale items that were pretested for content validity and then tested for reliability and construct validity in two studies involving a total of 152 users and four application programs. The measures were refined and streamlined, resulting in two six-item scales with reliabilities of .98 for usefulness and .94 for ease of use. The scales exhibited high convergent, discriminant, and factorial validity. Perceived usefulness was significantly correlated with both self-reported current usage (r=.63, Study 1) and self-predicted future usage (r =.85, Study 2). Perceived ease of use was also significantly correlated with current usage (r=.45, Study 1) and future usage (r=.59, Study 2). In both studies, usefulness had a significantly greater correlation with usage behavior than did ease of use. Regression analyses suggest that perceived ease of use may actually be a causal antecedent to perceived usefulness, as opposed to a parallel, direct determinant of system usage. Implications are drawn for future research on user acceptance.
Article
Full-text available
The System Usability Scale (SUS), developed by Brooke (Usability evaluation in industry, Taylor & Francis, London, pp 189-194, 1996), had a great success among usability practitioners since it is a quick and easy to use measure for collecting users' usability evaluation of a system. Recently, Lewis and Sauro (Proceedings of the human computer interaction international conference (HCII 2009), San Diego CA, USA, 2009) have proposed a two-factor structure-Usability (8 items) and Learnability (2 items)-suggesting that practitioners might take advantage of these new factors to extract additional information from SUS data. In order to verify the dimensionality in the SUS' two-component structure, we estimated the parameters and tested with a structural equation model the SUS structure on a sample of 196 university users. Our data indicated that both the unidimensional model and the two-factor model with uncorrelated factors proposed by Lewis and Sauro (Proceedings of the human computer interaction international conference (HCII 2009), San Diego CA, USA, 2009) had a not satisfactory fit to the data. We thus released the hypothesis that Usability and Learnability are independent components of SUS ratings and tested a less restrictive model with correlated factors. This model not only yielded a good fit to the data, but it was also significantly more appropriate to represent the structure of SUS ratings.
Book
You're being asked to quantify your usability improvements with statistics. But even with a background in statistics, you are hesitant to statistically analyze their data, as they are often unsure which statistical tests to use and have trouble defending the use of small test sample sizes. The book is about providing a practical guide on how to solve common quantitative problems arising in usability testing with statistics. It addresses common questions you face every day such as: Is the current product more usable than our competition? Can we be sure at least 70% of users can complete the task on the 1st attempt? How long will it take users to purchase products on the website? This book shows you which test to use, and how provide a foundation for both the statistical theory and best practices in applying them. The authors draw on decades of statistical literature from Human Factors, Industrial Engineering and Psychology, as well as their own published research to provide the best solutions. They provide both concrete solutions (excel formula, links to their own web-calculators) along with an engaging discussion about the statistical reasons for why the tests work, and how to effectively communicate the results. *Provides practical guidance on solving usability testing problems with statistics for any project, including those using Six Sigma practices *Show practitioners which test to use, why they work, best practices in application, along with easy-to-use excel formulas and web-calculators for analyzing data *Recommends ways for practitioners to communicate results to stakeholders in plain English. © 2012 Jeff Sauro and James R. Lewis Published by Elsevier Inc. All rights reserved.
Article
The Usability Metric for User Experience (UMUX) is a four-item Likert scale aimed at replicating the psychometric properties of the System Usability Scale (SUS) in a more compact form. As part of a special issue of the journal Interacting with Computers, the UMUX is being examined in terms of purpose, reliability, validity and structure. This response to commentaries addresses concerns with these issues through updated archival research, deeper analysis on the original data and some updated results with an average-scoring system. The new results show the UMUX performs as expected for a wide range of systems and consists of one underlying usability factor.
Article
This paper characterizes the usability of 14 common, everyday products using the System Usability Scale (SUS). Over 1,000 users were queried about the usability of these products using an online survey methodology. The study employed two novel applications of the SUS. First, participants were not asked to perform specific tasks on these products before rating their usability, but were rather asked to assess usability based on their overall integrated experience with a given product. Second, some of the evaluated products were assessed as a class of products (e.g. ‘microwaves’) rather than a specific make and model, as is typically done. The results show clear distinctions among different products and will provide practitioners and researchers with important known benchmarks as they seek to characterize and describe results from their own usability studies.
Article
This chapter discusses the user's experience and evolution of usability engineering. Usability engineering starts with a commitment to action in the world. It seeks to capture user experience within a context situated in user work and in a form useful for engineering. Usability engineering provides operationally defined criteria so that usability objectives can be used to drive an efficient and productive engineering effort, and it can lead to production of systems that are experienced by users as usable and that serve as a basis for the next generation of systems. When a system is built and delivered to users, interaction with it would affect user experience and would shift the background against which users evaluate that system in comparison with other systems. Therefore, as systems are built that provide new functionality with new levels of usability, the expectations of users would shift so that the whole cycle should begin again.
Book
Measuring the User Experience was the first book that focused on how to quantify the user experience. Now in the second edition, the authors include new material on how recent technologies have made it easier and more effective to collect a broader range of data about the user experience. As more UX and web professionals need to justify their design decisions with solid, reliable data, Measuring the User Experience provides the quantitative analysis training that these professionals need. The second edition presents new metrics such as emotional engagement, personas, keystroke analysis, and net promoter score. It also examines how new technologies coming from neuro-marketing and online market research can refine user experience measurement, helping usability and user experience practitioners make business cases to stakeholders. The book also contains new research and updated examples, including tips on writing online survey questions, six new case studies, and examples using the most recent version of Excel.
Article
The System Usability Scale (SUS) was administered verbally to native English and non-native English speakers for several internally deployed applications. It was found that a significant proportion of non-native English speakers failed to understand the word "cumbersome" in Item 8 of the SUS (that is, "I found the system to be very cumbersome to use.") This finding has implications for reliability and validity when the questionnaire is distributed electronically in multinational usability efforts.
Article
The System Usability Scale (SUS) is an inexpensive, yet effective tool for assessing the usability of a product, including Web sites, cell phones, interactive voice response systems, TV applications, and more. It provides an easy-to-understand score from 0 (negative) to 100 (positive). While a 100-point scale is intuitive in many respects and allows for relative judgments, information describing how the numeric score translates into an absolute judgment of usability is not known. To help answer that question, a seven-point adjective-anchored Likert scale was added as an eleventh question to nearly 1,000 SUS surveys. Results show that the Likert scale scores correlate extremely well with the SUS scores (r=0.822). The addition of the adjective rating scale to the SUS may help practitioners interpret individual SUS scores and aid in explaining the results to non-human factors professionals.
Article
This article presents nearly 10 year's worth of System Usability Scale (SUS) data collected on numerous products in all phases of the development lifecycle. The SUS, developed by Brooke (1996)2. Brooke , J. 1996. “SUS: A “quick and dirty” usability scale”. In Usability evaluation in industry, Edited by: Jordan , P. W. , Thomas , B. A. Weerdmeester and McClelland , I. L. 189–194. London: Taylor & Francis. View all references, reflected a strong need in the usability community for a tool that could quickly and easily collect a user's subjective rating of a product's usability. The data in this study indicate that the SUS fulfills that need. Results from the analysis of this large number of SUS scores show that the SUS is a highly robust and versatile tool for usability professionals. The article presents these results and discusses their implications, describes nontraditional uses of the SUS, explains a proposed modification to the SUS to provide an adjective rating that correlates with a given score, and provides details of what constitutes an acceptable SUS score.
The System Usability Scale: Beyond standard usability testing
  • R A Grier
  • A Bangor
  • P T Kortum
  • S C Peres
Grier, R. A., Bangor, A., Kortum, P. T., & Peres, S. C. (2013). The System Usability Scale: Beyond standard usability testing. In Proceedings of the Human Factors and Ergonomics Society (pp. 187-191). Santa Monica, CA: Human Factors and Ergonomics Society.
The Computer User Satisfaction Inventory (CUSI): Manual and scoring key
  • J Kirakowski
  • A Dillon
Kirakowski, J., & Dillon, A. (1988). The Computer User Satisfaction Inventory (CUSI): Manual and scoring key. Cork, Ireland: Human Factors Research Group, University College of Cork.
A comparison of questionnaires for assessing website usability. Paper presented at the Usability Professionals Association Annual Conference
  • T S Tullis
  • J N Stetson
Tullis, T. S., & Stetson, J. N. (2004). A comparison of questionnaires for assessing website usability. Paper presented at the Usability Professionals Association Annual Conference. Minneapolis, MN: UPA. Retrieved September 13, 2017, from https://www.researchgate. net/publication/228609327_A_Comparison_of_Questionnaires_for_ Assessing_Website_Usability.
Lewis is a senior human factors engineer at IBM, currently focusing on the design/evaluation of conversational applications. He has published influential papers in the areas of usability testing and measurement, including the books Practical Speech User Interface Design and
  • R James
James R. Lewis is a senior human factors engineer at IBM, currently focusing on the design/evaluation of conversational applications. He has published influential papers in the areas of usability testing and measurement, including the books Practical Speech User Interface Design and (with Jeff Sauro) Quantifying the User Experience.