Experiment FindingsPDF Available

Construction and first Validation of Extension Scales for the User Experience Questionnaire (UEQ)

Authors:

Abstract and Figures

The UEQ (Laugwitz, Schrepp & Held, 2008) is a frequently used questionnaire that measures user experience (short UX) on 6 distinct scales (Attractiveness, Efficiency, Perspicuity, Dependability, Stimulation, Novelty). Of course, these 6 scales do not cover the entire spectrum of UX. For some products special UX aspects not contained in the UEQ are of high importance for the overall UX impression. For this reason, some authors already created extension scales for the UEQ. To cover a broader range of UX we describe the construction and first validation of several additional extension scales. This research report gives a detailed description of the data analysis done for scale construction and first validations of the extension scales.
Content may be subject to copyright.
1
Construction and first Validation of
Extension Scales for the User Experience
Questionnaire (UEQ)
Martin Schrepp
SAP SE
Jörg Thomaschewski
University of Applied Sciences Emden/Leer
26.06.2019
2
Contents
1 Research Goal ......................................................................................................................... 3
2 Scale Construction .................................................................................................................. 3
2.1 Construction of candidate items .................................................................................... 3
2.2 Setup of the study ............................................................................................................ 3
2.3 Results............................................................................................................................... 3
2.3.1 Aesthetics.................................................................................................................. 5
2.3.2 Adaptability............................................................................................................... 6
2.3.3 Usefulness ................................................................................................................. 7
2.3.4 Intuitive Use .............................................................................................................. 8
2.3.5 Value ......................................................................................................................... 9
2.3.6 Content Quality ...................................................................................................... 10
3 Scale Validation ..................................................................................................................... 11
3.1 Setup of the study .......................................................................................................... 11
3.2 Results............................................................................................................................. 12
3.2.1 Web-Shops .............................................................................................................. 13
3.2.2 Video-Platforms ...................................................................................................... 17
3.2.3 Programming Environments .................................................................................. 21
3.2.4 Overall Results ........................................................................................................ 25
List of Figures ................................................................................................................................. 26
List of Tables .................................................................................................................................. 27
Bibliography ................................................................................................................................... 28
Appendix Screenshots of the used questionnaires ................................................................. 29
3
1 Research Goal
The UEQ (Laugwitz, Schrepp & Held, 2008) is a frequently used questionnaire that measures
user experience (short UX) on 6 distinct scales (Attractiveness, Efficiency, Perspicuity,
Dependability, Stimulation, Novelty). Of course, these 6 scales do not cover the entire
spectrum of UX. For some products special UX aspects not contained in the UEQ are of high
importance for the overall UX impression. For this reason, some authors already created
extension scales for the UEQ. Hinderks (2016) created a scale to measure Trust. Boos & Brau
(2017) created scales for Acoustics and Haptics (properties important for household
appliances). To cover a broader range of UX we describe the construction and first validation
of several additional extension scales.
This research report gives a detailed description of the data analysis done for scale
construction and first validations of the extension scales. The description of the context of the
research and the application of the extension scales is explained in several other publications
and not part of this document.
2 Scale Construction
A paper of Winter, Hinderks, Schrepp & Thomaschewski (2017) investigates the importance
of different UX aspects for typical product categories. The list of UX aspects used in this paper
is the basis for our scale construction. We selected the following UX aspects, which are of
importance for a wider range of product categories: Aesthetics, Adaptability, Usefulness,
Intuitive use, Value and Content Quality. For each of these aspects we construct an UEQ
extension scale.
2.1 Construction of candidate items
For each selected UX aspect several potential German items in the UEQ format (semantic
differential with a 7-point Likert-scale) were constructed by two experts. We call these items
in the following the candidate items. The candidate items per scale can be found below as part
of the data analysis.
2.2 Setup of the study
191 students of the University of Applied Sciences Emden/Leer (119 male, 72 female, average
age 30,42) rated several products from 10 different product categories (for example, learning
platforms, online banking, social networks, video portals, programming environments)
concerning their UX quality for all constructed candidate items. Participation was on a
voluntary basis.
2.3 Results
The data per scale were analysed with principal component analysis (with varimax rotation)
under the assumption that all items in the scale describe the same UX aspect, i.e the number
of factors was set to one. The R package psych (Revelle, 2017) was used for the analysis.
4
The results showed, except for the scale Content Quality, that a single factor explains the data
quite well. The items showing the highest loading on the factor were selected as
representations of the scale.
For Content Quality a two-factor solution shows a better fit and the two factors had a
reasonable semantic interpretation. Thus, it was decided to split the items into two sets
representing different UX aspects, i.e. to construct two extension scales instead of one. These
scales were named as Quality of Content and Trustworthiness of Content based on the
meaning of the items that load on these scales.
5
2.3.1 Aesthetics
Figure 1: Results of Principal Component Analysis and Factorial Analysis for Aesthetics.
Proportion of Variance explained: 0.64
Fit based upon off diagonal values = 0.99 (values > 0.95 indicate a good fit).
Table 1: Loadings of the candidate items for Aesthetics.
Original German Item
PC1
h2
u2
hässlich / schön
0.89
0.80
0.20
stillos / stilvoll
0.86
0.75
0.25
nicht ansprechend / ansprechend
0.88
0.78
0.22
farblich unschön / farblich schön
0.79
0.62
0.38
unharmonisch / harmonisch
0.84
0.71
0.29
unästhetisch / ästhetisch
0.88
0.77
0.23
nicht kunstvoll / kunstvoll
0.63
0.40
0.60
unüberlegt / durchdacht
0.51
0.26
0.74
English translation of the selected items for the scale (bold font in the table): ugly / beautiful, lacking style /
stylish, unappealing / appealing, unpleasant / pleasant.
6
2.3.2 Adaptability
Figure 2: Results of Principal Component Analysis and Factorial Analysis for Adaptability.
Proportion of Variance explained: 0.75
Fit based upon off diagonal values = 0.99 (values > 0.95 indicate a good fit).
Table 2: Loadings of the candidate items for Adaptability.
Original German Item
PC1
h2
u2
starr / flexibel
0.86
0.73
0.27
nicht anpassbar / anpassbar
0.89
0.79
0.21
nicht veränderbar / veränderbar
0.89
0.79
0.21
nicht erweiterbar / erweiterbar
0.87
0.75
0.25
nicht einstellbar / einstellbar
0.86
0.74
0.26
nicht anpassungsfähig / anpassungsfähig
0.84
0.70
0.30
English translation of the selected items for the scale (bold font in the table): not adjustable / adjustable, not
changeable / changeable, inflexible / flexible, not extendable / extendable.
7
2.3.3 Usefulness
Figure 3: Results of Principal Component Analysis and Factorial Analysis for Usefulness.
Proportion of Variance explained: 0.65
Fit based upon off diagonal values = 0.99 (values > 0.95 indicate a good fit).
Table 3: Loadings of the candidate items for Usefulness.
Original German Item
PC1
h2
u2
unpraktisch / praktisch
0.78
0.62
0.38
nutzlos / nützlich
0.83
0.69
0.31
unbrauchbar / brauchbar
0.79
0.62
0.38
nicht hilfreich / hilfreich
0.84
0.70
0.30
nicht zweckmäßig / zweckmäßig
0.76
0.58
0.42
nicht vorteilhaft / vorteilhaft
0.86
0.74
0.26
nicht lohnend / lohnend
0.80
0.64
0.36
unproduktiv / produktiv
0.76
0.58
0.42
English translation of the selected items for the scale (bold font in the table): useless / useful, not helpful / helpful,
not beneficial / beneficial, not rewarding / rewarding.
8
2.3.4 Intuitive Use
Figure 4: Results of Principal Component Analysis and Factorial Analysis for Intuitive Use.
Proportion of Variance explained: 0.74
Fit based upon off diagonal values = 0.99 (values > 0.95 indicate a good fit).
Table 4: Loadings of the candidate items for Intuitive Use.
Original German Item
PC1
h2
u2
nicht intuitiv / intuitiv
0.86
0.74
0.26
nicht direkt / direkt
0.86
0.73
0.27
nicht spontan / spontan
0.69
0.48
0.52
unklar / klar
0.87
0.76
0.24
mühevoll / mühelos
0.88
0.78
0.22
unlogisch / logisch
0.90
0.82
0.18
nicht einleuchtend / einleuchtend
0.89
0.80
0.20
nicht schlüssig / schlüssig
0.90
0.81
0.19
English translation of the selected items for the scale (bold font in the table): difficult / easy, illogical / logical,
not plausible / plausible, inconclusive / conclusive.
9
2.3.5 Value
Figure 5: Results of Principal Component Analysis and Factorial Analysis for Value.
Proportion of Variance explained: 0.52
Fit based upon off diagonal values = 0.96 (values > 0.95 indicate a good fit).
Table 5: Loadings of the candidate items for Value.
Original German Item
PC1
h2
u2
minderwertig / wertvoll
0.83
0.68
0.32
stilvoll / stillos
0.48
0.23
0.77
nicht vorzeigbar / vorzeigbar
0.79
0.63
0.37
nicht geschmackvoll / geschmackvoll
0.81
0.65
0.35
konzeptlos / kunstfertig
0.50
0.25
0.75
laienhaft / fachmännisch
0.68
0.47
0.53
nicht elegant / elegant
0.82
0.67
0.33
unvollkommen / vollkommen
0.77
0.59
0.41
English translation of the selected items for the scale (bold font in the table): inferior / valuable, not presentable
/ presentable, tasteless / tasteful, not elegant / elegant.
10
2.3.6 Content Quality
Figure 6: Results of Principal Component Analysis and Factorial Analysis for Content Quality.
The Kaiser-Gutmann Criterion (eigenvalues < 1) as well as the scree plot indicate that a two-
factor solution may fit the data better than a one-factor solution in PCA. Since the two factors
could also be interpreted semantically two scales were defined.
Proportion of Variance explained: 0.54 (Factor 1), 0.46 (Factor 2).
Fit based upon off diagonal values = 97 (values > 0.95 indicate a good fit).
Table 6: Loadings of the candidate items for Content Quality.
Original German Item
PC1
PC2
h2
u2
veraltet / aktuell
0.32
0.64
0.51
0.49
nicht informativ / informativ
0.55
0.56
0.61
0.39
uninteressant / interessant
0.21
0.68
0.50
0.50
schlecht aufbereitet / gut aufbereitet
0.30
0.77
0.68
0.32
minderwertig / hochwertig
0.10
0.78
0.62
0.38
unverständlich / gut verständlich
0.58
0.58
0.67
0.33
nutzlos / nützlich
0.68
0.33
0.57
0.43
unglaubwürdig / glaubwürdig
0.90
0.18
0.84
0.16
unseriös / seriös
0.89
0.20
0.83
0.17
ungenau / genau
0.77
0.28
0.66
0.34
11
English translation of the selected items for the scale PC2 (named Quality of Content): obsolete / up-to-date, not
interesting / interesting, poorly prepared / well prepared, incomprehensible / comprehensible.
English translation of the selected items for the scale PC1 (named Trustworthiness of Content): useless / useful,
implausible / plausible, untrustworthy / trustworthy, inaccurate / accurate.
3 Scale Validation
We report in the following first validation results for the new scales. Of course, scale validation
will be continued in further studies.
3.1 Setup of the study
To evaluate the scale quality the three product categories Web Shops, Video Platforms and
Programming Environments were selected. Per product category two well-known products
from that category were selected (Web Shops: Otto.de, Zalando.de; Video Platforms: Netflix,
Amazon Prime; Programming Environments: Eclipse, Visual Studio).
Per product category a specialized UX questionnaire containing the scales that seems to be
most important for products of this category (see Winter, Hinderks, Schrepp &
Thomaschewski, 2017 for details) was constructed. The scales per category can be found in
the results section below. As scales the new extension scales and existing scales from the UEQ
were used.
Participants were recruited per E-Mail campaigns and by links posted to web sites. Each
participant had the choice to rate one product that he or she used regularly from one of the
product categories, thus we have different numbers of ratings for the different products.
The following data were collected per questionnaire:
Age
Gender
Per extension scale:
o The 4 items of the scale
o An overall rating concerning the importance of the scale for the evaluated
product.
A rating for the overall satisfaction with the evaluated product.
Screen shots of the used HTML questionnaires can be found in the Appendix.
The importance ratings and the ratings of the items for a scale are used to calculate an UX KPI.
Calculation is done using the same procedure as described in Hinderks, Schrepp, Mayo,
Escalona & Thomaschewski (2019).
12
3.2 Results
The following Table 7 shows some results concerning participation and some demographic
data of the participants.
Table 7: Overview of participation and demographic information for all studies. Time
represents the time in ms between starting the online questionnaire and the click on the submit
button. Clicks is the average number of clicks by the participant. Scales shows the number of
scales used in the study (detailed scales used in a study are described below).
Product
N
Av. Age
Gender
Time
Scales
otto.de
42
34
16 m, 25 f, 1 NA
202899
8
zalando.de
46
31
20 m, 24 f, 2 NA
187803
8
Netflix
73
31
42 m, 27 f, 4 NA
211112
7
Amazon Prime
57
32
36 m, 21 f
259491
7
Eclipse
14
36
7 m, 4 f, 3 NA
368552
7
Visual Studio
29
32
25 m, 1 f, 3 NA
225006
7
Please note that per scale 4 items and the importance of the scale must be rated. Thus, for 8
scales this requires 40 clicks. In addition, the overall satisfaction must be rated and some clicks
are required to state age and gender.
Thus, filling out the corresponding questionnaires seems to be not much effort for the
participants. They spend around 4 Min. in answering the questions and in addition selected
answers seem to be not changed too often afterwards. This indicates that the used terms are
not problematic or difficult to understand.
13
3.2.1 Web-Shops
Results for Otto.de
Table 8: Scale means, standard deviation and confidence per scale for Otto.de.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence Interval
Attractiveness
1.30
1.43
1.19
42
0.36
0.94
1.66
Dependability
1.58
1.18
1.08
42
0.33
1.26
1.91
Intuitive Use
1.57
1.19
1.09
42
0.33
1.24
1.89
Visual Aesthetics
0.89
2.01
1.41
42
0.43
0.46
1.32
Quality of Content
1.35
1.28
1.13
42
0.34
1.00
1.69
Trustworthiness of Content
1.33
1.32
1.15
42
0.35
0.98
1.67
Trust
1.28
1.45
1.20
42
0.36
0.92
1.64
Value
0.93
1.56
1.24
42
0.38
0.56
1.31
Figure 7: Scale means and confidence intervals per scale for Otto.de.
14
Table 9: Importance ratings and confidence intervals for Otto.de.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence Interval
Attractiveness
1.14
1.74
1.30
42
0.39
0.75
1.54
Dependability
1.59
1.05
1.01
42
0.31
1.28
1.89
Intuitive Use
1.86
1.25
1.10
42
0.33
1.52
2.19
Visual Aesthetics
0.95
2.19
1.46
42
0.44
0.51
1.39
Quality of Content
1.71
1.11
1.04
42
0.32
1.39
2.02
Trustworthiness of Content
1.81
0.99
0.98
42
0.30
1.51
2.11
Trust
2.02
1.07
1.02
42
0.31
1.71
2.33
Value
0.67
1.50
1.21
42
0.37
0.30
1.03
Figure 8: Importance ratings and their confidence intervals for Otto.de.
Cronbach Alpha values for the scales:
Attractiveness: 0.93
Dependability: 0.82
Intuitive Use: 0.94
Visual Aesthetics: 0.95
Quality of Content: 0.89
Trustworthiness of Content: 0.86
Trust: 0.90
Value: 0.93
15
Results for Zalando.de
Table 10: Scale means, standard deviation and confidence per scale for Zalando.de.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence
Interval
Attractiveness
1.68
1.20
1.09
46
0.32
1.36
2.00
Dependability
2.02
0.79
0.89
46
0.26
1.76
2.27
Intuitive Use
2.13
0.76
0.87
46
0.25
1.88
2.38
Visual Aesthetics
1.47
1.68
1.29
46
0.37
1.09
1.84
Quality of Content
1.91
0.93
0.96
46
0.28
1.63
2.19
Trustworthiness of Content
1.73
1.05
1.02
46
0.30
1.43
2.02
Trust
1.26
1.42
1.19
46
0.34
0.92
1.60
Value
1.58
1.34
1.16
46
0.33
1.25
1.91
Figure 9: Scale means and confidence intervals per scale for Zalando.de.
16
Table 11: Importance ratings and confidence intervals for Zalando.de.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence
Interval
Attractiveness
0.93
1.93
1.37
46
0.40
0.54
1.33
Dependability
1.89
0.81
0.89
46
0.26
1.63
2.15
Intuitive Use
2.17
0.95
0.96
46
0.28
1.90
2.45
Visual Aesthetics
1.35
2.10
1.43
46
0.41
0.93
1.76
Quality of Content
1.87
1.09
1.03
46
0.30
1.57
2.17
Trustworthiness of Content
2.04
0.93
0.95
46
0.28
1.77
2.32
Trust
2.22
0.80
0.88
46
0.26
1.96
2.47
Value
1.27
1.75
1.31
45
0.38
0.88
1.65
Figure 10: Importance ratings and their confidence intervals for Zalando.de.
Cronbach Alpha values for the scales:
Attractiveness: 0.92
Dependability: 0.85
Intuitive Use: 0.90
Visual Aesthetics: 0.95
Quality of Content: 0.78
Trustworthiness of Content: 0.81
Trust: 0.93
Value: 0.88
17
3.2.2 Video-Platforms
Results for Netflix
Table 12: Scale means, standard deviation and confidence per scale for Netflix.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence Interval
Attractiveness
2.13
1.14
1.06
73
0.24
1.88
2.37
Perspicuity
2.04
1.41
1.19
73
0.27
1.77
2.31
Intuitive Use
1.86
1.35
1.16
73
0.27
1.60
2.13
Visual Aesthetics
1.58
1.37
1.17
73
0.27
1.32
1.85
Quality of Content
1.83
1.51
1.23
73
0.28
1.55
2.11
Trustworthiness of Content
1.48
1.26
1.12
73
0.26
1.22
1.74
Trust
1.03
1.97
1.40
73
0.32
0.71
1.35
Figure 11: Scale means and confidence intervals per scale for Netflix.
18
Table 13: Importance ratings and confidence intervals for Netflix.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence Interval
Attractiveness
1.45
1.58
1.25
73
0.29
1.17
1.74
Perspicuity
2.22
0.65
0.80
73
0.18
2.04
2.40
Intuitive Use
1.96
1.11
1.05
73
0.24
1.72
2.20
Visual Aesthetics
1.10
1.95
1.39
73
0.32
0.78
1.41
Quality of Content
1.79
1.19
1.08
73
0.25
1.55
2.04
Trustworthiness of Content
1.08
2.19
1.47
73
0.34
0.75
1.42
Trust
1.85
1.06
1.02
73
0.23
1.61
2.08
Figure 12: Importance ratings and their confidence intervals for Netflix.
Cronbach Alpha values for the scales:
Attractiveness: 0.95
Perspicuity: 0.80
Intuitive Use: 0.90
Visual Aesthetics: 0.89
Quality of Content: 0.84
Trustworthiness of Content: 0.87
Trust: 0.90
19
Results for Amazon Prime
Table 14: Scale means. standard deviation and confidence per scale for Amazon Prime.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence Interval
Attractiveness
1.61
1.30
1.14
57
0.29
1.31
1.90
Perspicuity
1.62
1.99
1.41
57
0.37
1.26
1.99
Intuitive Use
1.36
1.90
1.38
57
0.36
1.00
1.71
Visual Aesthetics
1.01
1.64
1.28
57
0.33
0.68
1.34
Quality of Content
1.49
1.63
1.27
57
0.33
1.16
1.82
Trustworthiness of Content
1.46
1.47
1.21
57
0.31
1.15
1.78
Trust
0.71
2.99
1.73
57
0.45
0.26
1.16
Figure 13: Scale means and confidence intervals per scale for Amazon Prime.
20
Table 15: Importance ratings and confidence intervals for Amazon Prime.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence Interval
Attractiveness
1.61
0.99
0.99
57
0.26
1.36
1.87
Perspicuity
2.27
0.71
0.83
57
0.22
2.05
2.48
Intuitive Use
1.86
1.09
1.03
57
0.27
1.59
2.13
Visual Aesthetics
1.11
1.88
1.36
57
0.35
0.75
1.46
Quality of Content
1.63
1.27
1.12
57
0.29
1.34
1.92
Trustworthiness of Content
1.49
1.29
1.13
57
0.29
1.20
1.78
Trust
1.91
1.01
1.00
57
0.26
1.65
2.17
Figure 14: Importance ratings and their confidence intervals for Amazon Prime.
Cronbach Alpha values for the scales:
Attractiveness: 0.90
Perspicuity: 0.91
Intuitive Use: 0.94
Visual Aesthetics: 0.94
Quality of Content: 0.82
Trustworthiness of Content: 0.87
Trust: 0.96
21
3.2.3 Programming Environments
Results for Eclipse
Table 16: Scale means. standard deviation and confidence per scale for Eclipse.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence Interval
Attractiveness
0.48
2.98
1.71
14
0.90
-0.41
1.38
Dependability
0.84
3.30
1.80
14
0.94
-0.10
1.78
Perspicuity
0.11
2.86
1.68
14
0.88
-0.77
0.99
Efficiency
0.71
2.39
1.53
14
0.80
-0.09
1.52
Usefulness
1.21
3.08
1.74
14
0.91
0.30
2.13
Personalization
1.25
2.48
1.56
14
0.82
0.43
2.07
Value
0.32
2.73
1.64
14
0.86
-0.54
1.18
Figure 15: Scale means and confidence intervals per scale for Eclipse.
22
Table 17: Importance ratings and confidence intervals for Eclipse.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence Interval
Attractiveness
1.21
1.72
1.26
14
0.66
0.55
1.88
Dependability
1.79
0.49
0.67
14
0.35
1.43
2.14
Perspicuity
1.57
1.19
1.05
14
0.55
1.02
2.12
Efficiency
1.57
1.34
1.12
14
0.58
0.99
2.16
Usefulness
1.00
1.23
1.07
14
0.56
0.44
1.56
Personalization
1.38
1.09
1.00
14
0.53
0.86
1.91
Value
0.14
1.98
1.36
14
0.71
-0.57
0.85
Figure 16: Importance ratings and their confidence intervals for Eclipse.
Cronbach Alpha values for the scales:
Attractiveness: 0.93
Dependability: 0.97
Perspicuity: 0.93
Efficiency: 0.90
Usefulness: 0.98
Adaptability: 0.96
Value: 0.93
23
Results for Visual Studio
Table 18: Scale means. standard deviation and confidence per scale for Visual Studio.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence Interval
Attractiveness
1.67
0.69
0.83
29
0.30
1.37
1.97
Dependability
1.77
0.68
0.82
29
0.30
1.47
2.07
Perspicuity
0.93
1.35
1.16
29
0.42
0.51
1.35
Efficiency
1.44
1.05
1.02
29
0.37
1.07
1.81
Usefulness
2.00
0.92
0.96
29
0.35
1.65
2.35
Personalization
1.78
0.83
0.91
29
0.33
1.45
2.11
Value
1.66
1.18
1.08
29
0.39
1.27
2.06
Figure 17: Scale means and confidence intervals per scale for Visual Studio.
24
Table 19: Importance ratings and confidence intervals for Visual Studio.
Scale
Mean
Variance
Std.dev.
N
Confidence
Confidence
Interval
Attractiveness
1.38
1.03
1.00
29
0.36
1.02
1.74
Dependability
2.07
0.92
0.94
29
0.34
1.73
2.41
Perspicuity
1.90
0.60
0.76
29
0.28
1.62
2.17
Efficiency
1.93
0.57
0.74
29
0.27
1.66
2.20
Usefulness
1.68
1.19
1.07
29
0.39
1.29
2.07
Personalization
1.68
1.49
1.20
29
0.44
1.24
2.11
Value
0.79
2.32
1.50
29
0.54
0.24
1.33
Figure 18: Importance ratings and their confidence intervals for Visual Studio.
Cronbach Alpha values for the scales:
Attractiveness: 0.76
Dependability: 0.83
Perspicuity: 0.86
Efficiency: 0.80
Usefulness: 0.82
Adaptability: 0.80
Value: 0.79
25
3.2.4 Overall Results
The following Table 20 shows for all studies the mean and standard deviation of the rating
concerning the overall satisfaction and the calculated KPI. In addition, the correlation
between the overall satisfaction and the KPI is shown.
Table 20: Overall satisfaction and KPI for all products.
Product
Overall Satisfaction
KPI
Corr
Mean
Std.Dev.
Mean
Std.Dev.
otto.de
5.48
1.24
1.27
0.90
0.71
zalando.de
5.65
0.91
1,70
0.69
0.66
Netflix
6.06
0.99
1.73
0.74
0.77
Amazon Prime
5.30
1.08
1.35
0.87
0.78
Eclipse
4.21
1.74
0.40
1.37
0.83
Visual Studio
5.55
0.97
1.59
0.57
0.71
Thus, the KPI is a very good predictor of the overall satisfaction. This is true for all investigated
products and all combinations of used scales. It is therefore maybe possible to establish a
benchmark based on the KPI.
26
List of Figures
Figure 1: Results of PCA and Factorial Analysis for Aesthetics …………………………………….
5
Figure 2: Results of PCA and Factorial Analysis for Adaptability ………………………………….
6
Figure 3: Results of PCA and Factorial Analysis for Usefulness …………………………………….
7
Figure 4: Results of PCA and Factorial Analysis for Intuitive Use …………………………………
8
Figure 5: Results of PCA and Factorial Analysis for Value ……………………………………………
9
Figure 6: Results of PCA and Factorial Analysis for Content Quality ……………………………
10
Figure 7: Scale means and confidence intervals per scale for Otto.de
13
Figure 8: Importance ratings and their confidence intervals for Otto.de
14
Figure 9: Scale means and confidence intervals per scale for Zalando.de
15
Figure 10: Importance ratings and their confidence intervals for Zalando.de
16
Figure 11: Scale means and confidence intervals per scale for Netflix
17
Figure 12: Importance ratings and their confidence intervals for Netflix
18
Figure 13: Scale means and confidence intervals per scale for Amazon Prime
19
Figure 14: Importance ratings and their confidence intervals for Amazon Prime
20
Figure 15: Scale means and confidence intervals per scale for Eclipse
21
Figure 16: Importance ratings and their confidence intervals for Eclipse
22
Figure 17: Scale means and confidence intervals per scale for Visual Studio
23
Figure 18: Importance ratings and their confidence intervals for Visual Studio
24
27
List of Tables
Table 1: Loadings of the candidate items for Aesthetics …………………………………………….
5
Table 2: Loadings of the candidate items for Adaptability …………………..…………………….
6
Table 3: Loadings of the candidate items for Usefulness …………………………….…………….
7
Table 4: Loadings of the candidate items for Intuitive Use ……………………………..………
8
Table 5: Loadings of the candidate items for Value ………………………………………………..…
9
Table 6: Loadings of the candidate items for Content Quality …………………………………
10
Table 7: Overview of participation and demographic information for all studies ……….
12
Table 8: Scale means, standard deviation and confidence per scale for Otto.de ………..
13
Table 9: Importance ratings and confidence intervals for Otto.de ……………………………..
14
Table 10: Scale means, standard deviation and confidence per scale for Zalando.de
15
Table 11: Importance ratings and confidence intervals for Zalando.de ………………………
16
Table 12: Scale means, standard deviation and confidence per scale for Netflix ………..
17
Table 13: Importance ratings and confidence intervals for Netflix ……………………………..
18
Table 14: Scale means. standard dev. and confidence per scale for Amazon Prime ……
19
Table 15: Importance ratings and confidence intervals for Amazon Prime …………………
20
Table 16: Scale means. standard deviation and confidence per scale for Eclipse ………..
21
Table 17: Importance ratings and confidence intervals for Eclipse …………………………….
22
Table 18: Scale means. standard dev. and confidence per scale for Visual Studio ……..
23
Table 19: Importance ratings and confidence intervals for Visual Studio ……………………
24
Table 20: Overall satisfaction and KPI for all products ………………………………………………..
25
28
Bibliography
Laugwitz. B.. Schrepp. M. & Held. T. (2008). Construction and evaluation of a user experience
questionnaire. In: Holzinger. A. (Ed.): USAB 2008. LNCS 5298. S. 63-76.
Boos. B. & Brau. H.. (2017). Erweiterung des UEQ um die Dimensionen Akustik und Haptik. In:
Hess. S. & Fischer. H. (Hrsg.). Mensch und Computer 2017 Usability Professionals.
Regensburg: Gesellschaft für Informatik e.V.. S. 321 327.
Hinderks. A. (2016). Modifikation des User Experience Questionnaire (UEQ) zur Verbesserung
der Reliabilität und Validität. Unveröffentlichte Masterarbeit. University of Applied Sciences
Emden/Leer.
Hinderks, A., Schrepp, M., Mayo, F. J. D., Escalona, M. J., & Thomaschewski, J. (2019).
Developing a UX KPI based on the user experience questionnaire. Computer Standards &
Interfaces.
Winter. D.. Hinderks. A.. Schrepp. M. & Thomaschewski. J.. (2017). Welche UX Faktoren sind
für mein Produkt wichtig? In: Hess. S. & Fischer. H. (Hrsg.). Mensch und Computer 2017 -
Usability Professionals. Regensburg: Gesellschaft für Informatik e.V. (S. 191 200).
Revelle. W. R. (2017). psych: Procedures for personality and psychological research.
https://CRAN.R-project.org/package=psych
29
Appendix Screenshots of the used questionnaires
The following pages show the used HTML questionnaires as full page screenshots. Pages are
shown in the original German version used for the collection of the data.
30
31
32
33
34
35
... They are concerned whether the system safety is about protecting the user or whether the user trusts the system to behave as intended. Schrepp and Thomaschewski (2019), when validating additional scales for UEQ, consider trust from more safety perspective and have dependability as a separate scale measuring trust in a similar manner defined by Petrie and Bevan (2009). UEQ+ is more recent version and a modular approach to UEQ (UEQ+, n.d.). ...
... Trustworthiness of content, haptics, acoustics, and adaptability of the product are measures that can provide insight to VIP using AT. (Schrepp and Thomaschewski, 2019). In addition, Arhippainen (2009) presents several additional contexts that are missing from many current usability and UX evaluation methods. ...
... So that VIP does not stumble on obstacles because of delays in communication or turn in a wrong direction when receiving information. Trustworthiness of information and dependability of the system are similar described by Schrepp and Thomaschewski (2019). Schilit (1995) identified three environmental categories that impact the interaction of a product. ...
Conference Paper
Full-text available
Blind and visually impaired people are among those who require very specific types of assistive technology. This type of technology has been a research and development focus for decades. As technology has matured, more affordable and practical solutions have entered the market. Unfortunately, this has not resulted in a very positive trend in terms of acceptance or adoption. Our research on usability and user experience indicated that there is a need to extend standardized questionnaires when evaluating assistive technology. This paper details the development of an evaluation model with extended contexts that can be used with cooperative assistive technology aimed for blind or visually impaired people and their caretakers. We present our evaluation setting and procedure for the model. The development of the model is based on testing, literature review, expert interviews, and discussions. The extended contexts should also contribute towards the trend of modularity with standardized user experience questionnaires.
... UX is a product feature that results from different subjective perceptions of quality and content aspects [32]. An important definition of UX is introduced by the ISO 9241-210 and outlines UX as a person's perceptions and responses result from the use or anticipated use of a product, system or service [18]. ...
... Usability testing provides qualitative data, while questionnaires create mainly quantitative data. However, the latter enables the collection of data from larger samples of users and is also an inexpensive method [32]. Nonetheless, not all UX aspects are relevant to evaluate a specific product as they vary between different services and use cases. ...
... As a result of this heterogeneity, it is very difficult for practitioners and UX researchers to choose a questionnaire to evaluate a concrete product. UX experts also recommend combining different questionnaires to measure different aspects of UX of a specific product or to design questionnaires [32]. The research area of CCI provides surveys [2] or interviews [39] as evaluation methods for measuring UX. ...
... We show as an example the candidate set and analysis for the UX aspect Beauty. A description for the data analysis for all scales can be found in [33]. ...
... If the one-dimensional solution fits the data well, we choose as in this example the 4 items with the highest loading on the factor as representatives for the new UEQ+ scale. This was the case for all UX aspects with the exception of Content Quality (see [33]). ...
... In general (see Table IV The observed ratings for the importance of the scales confirm that the selected scales were considered as important for the evaluated products by the participants. Detailed values of the importance ratings and some additional information concerning the scale means can be found in [33]. ...
Article
Full-text available
Existing user experience questionnaires have a fixed number of scales. Each of these scales measures a distinct aspect of user experience. These questionnaires can be used with little effort and provide a number of useful support materials that make the application of such a questionnaire quite easy. However, in practical evaluation scenarios it can happen that none of the existing questionnaires contains all scales necessary to answer the research question. It is of course possible to combine several UX questionnaires in such cases, but due to the variations of item formats this is also not an optimal solution. In this paper, we describe the development and first validation studies of a modular framework that allows the creation of user experience questionnaires that fit perfectly to a given research question. The framework contains several scales that measure different UX aspects. These scales can be combined to cover the relevant research questions.
... The use of semantic differentials to investigate pragmatic and hedonic quality aspects from the user's perspective of interactive products has a long tradition within the HCI community. Accordingly, UX extends the classic usability concept with additional quality aspects [13]. Not only pragmatic or task-related quality aspects such as efficiency, effectiveness, learnability or controllability play a role in evaluation, but also hedonic, non-task-related quality aspects. ...
... The categorization of the problematic items (see Table 3) gives UX researchers a better understanding of children's and young people's perceptions of UX, which should be taken into account when using or adapting UX questionnaires with younger target groups. Modular questionnaires for productspecific UX testing also play a more important role for adult users [13]. Interestingly, similar phenomena occur in both age groups when assessing semantic differentials. ...
Conference Paper
Full-text available
This paper explores the use of semantic differentials in evaluating interactive products from the user’s perspective, focusing on children and adolescents. While semantic differentials have been widely utilized in assessing user experience (UX) among adults, their application to younger target groups remains underexplored. Existing studies develop and validate questionnaires with semantic differentials for assessing UX in children, but challenges remain in understanding how adolescents evaluate products using this method and identifying potential error sources. The study aims to gain a comprehensive understanding of how children and adolescents interpret semantic differentials in product evaluation, particularly in usability testing scenarios. The study shows that children and adolescents’ comprehension of semantic differentials in UX questionnaires is affected by product characteristics and individual usage context, not just vocabulary. Analysis across class levels and product types identified error sources in interpreting semantic differentials for UX testing. Involving young individuals in shaping or selecting UX dimensions for questionnaires is crucial for ensuring their relevance and effectiveness.
... This questionnaire offers new scales developed by different authors. For AR, the scales of Intuitive Use or Visual Aesthetics (Schrepp & Thomaschewski, 2019) could be applied to gain deeper insights into the users' experience with the technology. ...
Article
Full-text available
Augmented Reality (AR) is established in a wide range of fields, but a specific area of application is AR content on social media. Because AR content can be a business offering, new opportunities are arising to present products and services to users of social media. AR filters or lenses may be published by companies either as advertising or as generic content on their social media profiles. Our study tested the generic branded AR lenses on the platform Snapchat™ for user experience. We conducted a user test with 27 Snapchat users using 3 different generic branded AR lenses. We recorded participants' experiences using the User Experience Questionnaire (UEQ) and individual short interviews. The latter included questions about personal preferences and the perceived augmentation quality, as augmentation quality has been proven important to the UX of AR. Our analysis of the data shows that AR lenses can achieve a good result in the UEQ scale Novelty and that the Attractiveness is above average. However, the results also indicate that the generic branded AR lenses need improvement in aspects related to usability. Additional interview questions revealed that the augmentation quality of generic branded AR lenses has not always been perceived positively, yet the UX of these lenses should be important to businesses that want to invest in this technology. Poorly implemented AR applications can compromise users' experience and, consequently, negatively impact a company and its brand.
... These 32 UX aspects represent what intensive users think about when evaluating the UX of VUIs. Established literature had already defined a few of the UX aspects that our participants named, such as efficiency and effectivity [11] and aesthetic [35]. However, these aspects are not specified for VUIs. ...
Chapter
Although voice user interfaces (VUIs) are widely available, they currently face challenges such as low adoption rates and user concerns. Users assess products through user experience (UX) aspects. Thus, knowing UX aspects for VUIs and their prioritization will improve UX and reduce challenges. In this study, we use a user-centered mixed-methods approach to identify and prioritize UX aspects of VUIs. Thereby, we identified 32 VUI UX aspects from the perspective of intensive users. We then applied the Kano model to categorize these UX aspects for which we analyzed N=195N=195 VUI users. One thing we found was that 21 VUI UX aspects are distinctively prioritized, such as privacy, data security, and ad-free as must-be, and simplicity, comprehension, and error-free as one-dimensional. These findings can help VUI developers to prioritize specific UX aspects according to their target group’s needs, enabling them to create VUIs that benefit and excite their users. KeywordsVoice user interfaceUser ExperienceKanoVoice assistantsUXPrioritizationMixed methodsHuman-centered design
... It is also important to study whether the users feel safe using the system and how much confidence they have in the assistive technology. The system should also feel dependable and trustworthy (Schrepp and Thomaschewski 2019), so that VIP does not stumble into obstacles because of delays in communication or turns in the wrong direction when receiving information from the caretaker. The social context should also be considered, as the assistive system relies on cooperation with a caretaker. ...
Article
Full-text available
This paper reports the development of a specialized teleguidance-based navigation assistance system for the blind and the visually impaired. We present findings from a usability and user experience study conducted with 11 blind and visually impaired participants and a sighted caretaker. Participants sent live video feed of their field of view to the remote caretaker’s terminal from a smartphone camera attached to their chest. The caretaker used this video feed to guide them through indoor and outdoor navigation scenarios using a combination of haptic and voice-based communication. Haptic feedback was provided through vibrating actuators installed in the grip of a Smart Cane. Two haptic methods for directional guidance were tested: (1) two vibrating actuators to guide left and right movement and (2) a single vibrating actuator with differentiating vibration patterns for the same purpose. Users feedback was collected using a meCUE 2.0 standardized questionnaire, interviews, and group discussions. Participants’ perceptions toward the proposed navigation assistance system were positive. Blind participants preferred vibrational guidance with two actuators, while partially blind participants preferred the single actuator method. Familiarity with cane use and age were important factors in the choice of haptic methods by both blind and partially blind users. It was found that smartphone camera provided sufficient field of view for remote assistance; position and angle are nonetheless important considerations. Ultimately, more research is needed to confirm our preliminary findings. We also present an expanded evaluation model developed to carry out further research on assistive systems.
Conference Paper
Full-text available
Voice User Interfaces (VUIs) are becoming increasingly available while users raise, e.g., concerns about privacy issues. User Experience (UX) helps in the design and evaluation of VUIs with focus on the user. Knowledge of the relevant UX aspects for VUIs is needed to understand the user’s point of view when developing such systems. Known UX aspects are derived, e.g., from graphical user interfaces or expert-driven research. The user’s opinion on UX aspects for VUIs, however, has thus far been missing. Hence, we conducted a qualitative and quantitative user study to determine which aspects users take into account when evaluating VUIs. We generated a list of 32 UX aspects that intensive users consider for VUIs. These overlap with, but are not limited to, aspects from established literature. For example, while Efficiency and Effectivity are already well known, Simplicity and Politeness are inherent to known VUI UX aspects but are not necessarily focused. Furthermore, Independency and Context-sensitivity are some new UX aspects for VUIs.
Conference Paper
Full-text available
Wir stellen einen modularen Ansatz vor, mit dem man sich aus einem Katalog von 16 UX Aspekten einen perfekt passenden UX Fragebogen bauen kann. Grundlage sind die 6 Skalen des UEQ, die um 10 weitere Skalen erweitert wurden. Wir beschreiben die Konstruktion und erste Evaluationsergebnisse zu diesen neuen Skalen. Es wird weiterhin beschrieben, wie UX Professionals sich aus diesem Katalog einfach einen Fragebogen zusammenstellen können, wie die Daten ausgewertet werden, was man bei diesem Ansatz in der praktischen Anwendung beachten sollte, aber auch wo die Limitationen dieses Vorgehens sind, d.h. in welchen Fällen man besser einen vorhandenen Standardfragebogen verwendet. Alle notwendigen Materialien werden im Vortrag besprochen und nach dem Vortrag frei zur Verfügung gestellt.
Conference Paper
Full-text available
Langwierige Diskussionen, bei denen sich die beteiligten Personen über einzelne Aspekte eines Designs nicht einigen können, sind für erfahrene UX-Professional nichts Neues. Solche Konflikte sind in der Regel kraft- und zeitraubend. Diese Konflikte resultieren oft aus unterschiedlichen Einschätzungen bzgl. der Wichtigkeit bestimmter Qualitätseigenschaften eines Produkts. Allerdings ist den meisten Produktteams, und damit den Beteiligten am Design-Prozess, nicht bewusst welche Aspekte wichtig sind und welche vernachlässigt werden können. Das macht die Diskussion in solchen Konfliktsituationen umso schwieriger, da die Beteiligten aus ihrer jeweiligen Perspektive Recht haben. Wir stellen einen Prozess vor, der die Reduzierung von Abstimmungskonflikten unterstützt und die Produktentwicklung dadurch beschleunigt. Als Grundlage verwenden wir 16 klar beschriebene UX-Faktoren, welche in empirischen Studien ermittelt und bzgl. ihrer Wichtigkeit für bestimmte Produktgruppen bewertet wurden.
Thesis
Full-text available
Der Mehrwert dieser Masterarbeit lässt sich in drei Bereiche zusammenfassen. Erstens wurde ein Fragebogen mit den Dimensionen Vertrauen, Attraktivität, Steuerbarkeit und Durchschaubarkeit entwickelt. Zweitens wurde eine Methode vorgestellt, mit der die Faktorenanalyse iterativ eingesetzt wird, um ein bessere Aussage bezüglich des Faktorenmodells und der Stichprobengröße geben zu können. Und als drittes wurde durch die Modifikation des UEQ die Grundlage für eine mögliche Modularisierung gelegt.
Construction and evaluation of a user experience questionnaire
Laugwitz. B.. Schrepp. M. & Held. T. (2008). Construction and evaluation of a user experience questionnaire. In: Holzinger. A. (Ed.): USAB 2008. LNCS 5298. S. 63-76.
Erweiterung des UEQ um die Dimensionen Akustik und Haptik
  • B Boos
  • H Brau
Boos. B. & Brau. H.. (2017). Erweiterung des UEQ um die Dimensionen Akustik und Haptik. In: Hess. S. & Fischer. H. (Hrsg.). Mensch und Computer 2017 -Usability Professionals. Regensburg: Gesellschaft für Informatik e.V.. S. 321 -327.