Content uploaded by Martin Schrepp
Author content
All content in this area was uploaded by Martin Schrepp on Sep 04, 2024
Content may be subject to copyright.
From customer survey feedback to software improvements: Leveraging the
full potential of data
Erik Bertram,1,2★Nina Hollender,1Sebastian Juhl,1,3Sandra Loop,1and Martin Schrepp1†
1SAP Deutschland SE & Co. KG, Hasso-Plattner-Ring 7, 69190 Walldorf, Germany
2Hochschule Fresenius Heidelberg, Sickingenstraße 63-65, 69126 Heidelberg, Germany
3University of Missouri, Columbia, USA
Version: 8 August 2024
ABSTRACT
Converting customer survey feedback data into usable insights has always been a great challenge for large software enterprises.
Despite the improvements on this field, a major obstacle often remains when drawing the right conclusions out of the data and
channeling them into the software development process. In this paper we present a practical end-to-end approach of how to extract
useful information out of a data set and leverage the information to drive change. We describe how to choose the right metrics to
measure, gather appropriate feedback from customer end-users, analyze the data by leveraging methods from inferential statistics,
make the data transparent, analyse large volumnes of user comments efficiently with Large Language Models, and finally drive
change with the results. Furthermore, we present an example of a UX dashboard that can be used to communicate the analyses
to stakeholders within the company.
Key words: Analytics – User Experience – Surveys – UX Dashboard – Standardized UX Questionnaires – UX Improvement
Process – GenAI for Comment Analysis
1 INTRODUCTION
“What gets measured gets managed” is a frequently cited quote that,
however, is incorrectly attributed to the famous Austrian American
management consultant Peter Drucker.1While this quote originated
from a direct critique of the narrow and lopsided utilization of perfor-
mance measures in a business context, the importance of quantitative
metrics for companies, especially large and globally acting software
enterprises, is growing (see e.g., Bauer 2004;Morgan and Rego 2006;
Van Looy and Shafagatova 2016).
As such, one of their main targets is to increase the adoption of
cloud services (Shimba 2010) and to keep a stable customer base to
drive revenue. To do so, every software company is eager to contin-
uously improve the experience of their cloud product portfolio and
deliver the world’s best software to customers.
Furthermore, in today’s competitive markets, user experience
(UX) quality is a pre-requisite for the long-term success of prod-
ucts (see e.g., Sward and Macarthur 2007;Ross 2014). Over time,
competing products become more and more similar in terms of their
functionality. For products with great UX, the experience becomes
part of their brand, and therefore it is important to constantly measure
the UX quality of products.
★E-mail: erik.bertram@sap.com
†Published by the Gesellschaft für Informatik e.V. and the German UPA
e.V. 2024 in T. Jackstädt, S.J. Wiedenroth & J. Hinze (Hrsg.): Mensch und
Computer 2024 – Usability Professionals, 01.-04. September 2024, Karlsruhe
©2024 Copyright bei den Autoren. http://dx.doi.org/10.18420/muc2024-up-
177
1The quote originated from the columnist Simon Caulkin who summarized
an argument in a paper by Ridgway (1956).
Furthermore, previous studies have revealed the role of change
management and its impact on business processes and employees
(see e.g., Mento et al. 2002;By 2005;Kettinger and Grover 1995;
Hiatt and Creasey 2003;Schöllhorn et al. 2019). However, compa-
nies face some notable challenges. For example, channeling end-users
feedback into the software development process in an agile manner
remains a big problem. Furthermore, it is often unclear how the gap
between survey feedback and business insights can be overcome, and
how these insights can be leveraged for continuous code improve-
ments.
In this article, we aim to present a general and scalable end-to-end
process and describe how companies can gather high-quality product
feedback from end-users, analyze existing data with statistical meth-
ods, draw reliable conclusions, and finally feed this information into
the development process.
We will only use simulated data to illustrate our approach, as well
as dummy products that we name Product A to Product F, for simplic-
ity. All UX questionnaires presented below are publicly available on
the internet and a common standard in many software enterprises (see
e.g., Laugwitz et al. 2006;Schrepp and Thomaschewski 2019;Fisher
and Kordupleski 2019;Lewis et al. 2013). The article itself targets
decision makers as well as developers, designers, and managers.
In general, we believe that establishing a smooth end-to-end pro-
cess always requires a similar strategy, which we would like to sketch
out on the following pages. The readers might feel free to adapt one
or the other idea for their own purpose as well. Depending on already
existing quality processes in a company and the internal complexity
of product offerings the described process needs to be adjusted.
2E. Bertram et al.
2 LISTENING TO CUSTOMER END-USERS
In the following, we describe several requirements and challenges
that a company might face when listening to end-users with the help
of surveys, including how to decide on appropriate survey items and
corresponding UX metrics.
2.1 Using standardized questionnaires to measure UX
If we want to measure the user experience of a product quantitatively,
we need to develop a clear understanding of what this term means
semantically. The ISO 9241-210 norm (ISO 2019) defines UX as a
“[. . . ] person’s perceptions and responses that result from the use
or anticipated use of a product, system, or service.” This definition
entails two important aspects.
First, UX is a subjective perception of users concerning a prod-
uct. Therefore, we must ask users, and different users might express
different opinions. Such differences can, for example, result from
personal preferences or diverse levels of expertise with a product.
Thus, it is important to collect feedback from larger user samples to
adequately cover the spectrum of different opinions.
Second, UX covers a wide range of different impressions. It does
not only include quality aspects associated to working on typical tasks
with the product (e.g., efficiency, controllability, or learnability), but
also aspects like fun of use, or the aesthetic appeal of the user interface
(see e.g., Schrepp 2021;Hinderks et al. 2021).
For many years now, the UX discipline has developed several tech-
niques to collect feedback from users. However, only a few options
exist to measure the UX quantitatively. Online surveys are a very
efficient tool to do so, since they allow collecting feedback from
large user groups with low effort and costs. To get high-quality data,
one needs to ensure that the measurement method fulfills common
quality criteria, such as, e.g., objectivity, reliability, and validity.
However, this is difficult to achieve if the questions are defined to
calculate a UX score from the answers ad hoc. Nevertheless, there
are many standardized UX questionnaires available that are carefully
constructed, validated, and described in scientific publications (see,
e.g., Schrepp 2021).
2.2 Choosing the correct UX questionnaires
As explained above, UX is a heterogeneous concept and there are
many questionnaires available that measure various aspects of UX.
Hence, the first challenge is to decide which questionnaires to use. In
general, we should measure those metrics that are most important for
our users and that relate to common business goals (Dalrymple et al.
2021;Sauro 2016). In addition, it is important to limit the number of
questions as much as possible to reduce the drop-out rate, since long
surveys can cause frustration on the end-user side. As an example, we
will discuss the selection of metrics to evaluate a product portfolio
below.
Business applications are used for professional work. Typical users
include, for example, developers of cloud applications, administra-
tors, analysts, decision-makers, sales representatives, or accountants.
For those users, both usability and usefulness are of high importance,
which is why we need to ensure that those metrics are measured ad-
equately. Therefore, the UX-LITE questionnaire is a natural choice,
which consists of two questions, as shown in Figure 1.
The UX-Lite (Lewis and Sauro 2021;Finstadt 2010) realizes a
concept similar to the Technology Acceptance Model (Davis 1989),
which assumes that user acceptance of a new technology is based
on its perceived usefulness (first item of the UX-Lite) and perceived
Figure 1. The two questions of the UX-LITE questionnaire range on a five-
point ordinal scale and measure usability and usefulness of a product.
Category Score Interval Percentile
A+ 84.1 100 96 - 100
A 80.8 84.0 90 - 95
A- 78.8 80.7 85 - 89
B+ 77.2 78.8 80 - 84
B 74.1 77.1 70 - 79
B- 72.6 74.0 65 - 69
C+ 71.1 72.5 60 - 64
C 65.0 71.0 41 - 59
C- 62.7 64.9 35 - 40
D 51.7 62.6 15 - 34
F 0.0 51.6 0 - 14
Table 1. Benchmarkcategor ies used for the UX-Lite, including score intervals
and percentiles. Please note that American school grades are used to classify
the categories.
ease of use (second item of the UX-Lite). The answers are scored by
numbers from 0 (Strongly disagree) to 4 (Strongly agree), while the
sum of the two questions is calculated per participant. Thus, we get
a rating between 0 (worst impression) and 8 (best impression). This
score is then transferred to a range between 0 and 100 by multiplying
it with a factor of 100/8=12.5. By averaging over all participants we
finally get a UX-Lite score for the product, which can be compared
to the SUS score (also ranging between 0 and 100) and that allows us
to use the well-established SUS benchmark (Lewis and Sauro 2018;
Lah et al. 2020). Hence, observed UX-LITE scores can be classified
into eleven categories shown in Table 1.
For example, a value of 82 would indicate that the product can
be assigned to category A (i.e., around 90% of the products in the
benchmark data set show a lower and 5% a higher score while the
remaining 5% of products fall in the same category). Such a value
indicates a pretty good quality, whereas if a value of 45 is measured,
the product would lie in category F (i.e., it is amongst the 14% of
products in the benchmark data set that show the lowest scores),
indicating a bad quality.
To keep users engaged and motivated, it is also important that they
perceive a product as interesting and enjoyable. There are few surveys
that measure joy of use. We selected the UEQ-S, a short version of
the User Experience Questionnaire (Laugwitz et al. 2008;Schrepp
et al. 2017), which is a short survey with eight semantic differential
questions, shown in Figure 2. All eight items are grouped into two
subscales. The first four items measure usability of the product, the
last four items fun-of-use or interest.
Items are scored from −3to +3from left to right. The mean for the
two subscales is simply the mean score over all items and participants
in a scale. The overall score is the mean value over all eight items.
The UEQ-S provides also a benchmark (see Table 2) that classifies
scores measured for the subscales into the five categories Excellent,
Good,Above Average,Below Average and Bad. The logic behind this
benchmark is similar to the benchmark used for the UX-LITE (i.e.,
From customer survey feedback to software improvements 3
Figure 2. The UEQ-S is a short survey with eight semantic differential ques-
tions, raging on a seven-point ordinal scale.
Category User Experience Usability Fun of Use Percentile
Excellent 1.58 - 3.00 1.74 - 3.00 1.59 - 3.00 91 - 100
Good 1.31 - 1.57 1.55 - 1.73 1.20 - 1.58 76 - 90
Above
Average 0.98 - 1.30 1.17 - 1.55 0.85 - 1.19 51 - 75
Below
Average 0.59 - 0.97 0.72 - 1.16 0.35 - 0.84 26 - 50
Bad -3.00 - 0.58 -3.00 - 0.71 -3.00 - 0.35 0 - 25
Table 2. The UEQ-S benchmark consists of five categories (Excellent, Good,
Above Average, Below Average, Bad). Shown are the scores for user expe-
rience, usability, fun of use, and the corresponding percentiles. Scales range
from -3 to +3.
Figure 3. The PSAT score accounts for customer satisfaction and also ranges
on a five-point ordinal scale.
an extensive reference data set of more than 450 studies was split
into these five groups).
Clearly, the UX is not the only driver that affects the success of a
product. Instead, both the support and documentation quality or the
availability of the application in the cloud are crucial aspects as well.
To account for those, we added the PSAT (Product Satisfaction)2
score, which is calculated from one single question, as indicated in
Figure 3.
The PSAT score is calculated as the percentage of participants
that answered with Very satisfied or Satisfied, thus ranging from 0
(worst) to 100 (best). It is an easy to interpret metric that repre-
sents the overall satisfaction of a product. The American Customer
Satisfaction Index (https://www.theacsi.org/) is based on the PSAT
and a yearly benchmark is published for different industry sectors,
including software products.
For cloud products, a high renewal contract rate is crucial for gen-
erating revenue and achieving long-standing business goals. Thus,
the loyalty of users or customers is of vital importance. The NPS
(Net Promotor Score) is a widely used method to measure customer
loyalty and will therefore be included as well. It is calculated from
the single question shown in Figure 4.
Responses are coded as Promoters (with scores of nine or ten),
2Note that in some instances, the PSAT score is referred to as ’Customer
Satisfaction’score. For consistency reasons, we use the term PSAT throughout
this publication.
Figure 4. The NPS ranges on a simple ten-point ordinal scale (from not at all
likely to very likely) and measures customer satisfaction respectively loyalty.
Passives (with scores of seven or eight) and Detractors (with scores
from zero to six). The NPS score is calculated by subtracting the
share of Detractors from the share of Promoters. The score ranges
from −100 (worst possible) to +100 (best possible). The core idea
of the NPS (Reichheld 2003) is that people that are very positively
impressed by a product or service (Promoters) will recommend this
product or service to others and thus increase the user base.
These four metrics seem to cover the most central aspects for a
portfolio of cloud applications. However, depending on the type of
applications and the specific business goals, other combinations of
metrics might be required. Thus, one must carefully investigate the
importance of different requirements and select the proper metrics
accordingly.
A big advantage of using established survey items in standardized
questionnaires is the availability of benchmarks, which consist of a
larger set of product evaluations that use the corresponding question-
naire. Hence, it is possible to compare own results with the results
of the products in the benchmark data set, helping to interpret and
contextualize the numbers.
2.3 Adding supplementary information
Metrics like those described above provide information about the
overall UX quality. However, the plain numbers do not explain which
product features must be improved to obtain better results in the fu-
ture. To collect information that helps to interpret the quantitative
results, we recommend including two open comment fields in the
survey. They ask participants to comment on the strong (What did
you like about the product?) and weak aspects of a product (What
should be improved?). Different users may have different opinions
concerning some product features. Thus, it is not unusual to receive
both positive and negative comments concerning the same product
feature. Therefore it is important to always ask for positive and neg-
ative feedback to avoid a biased impression. Analyzing these com-
ments helps to understand the ratings and is crucial to transfer the
interpretations into concrete actions.
Utilizing the capabilities of generative AI tools and Large Lan-
guage Models like GTP-4 it is possible to automatically summarize
and categorize a vast amount of user comments. This can save time
and resources while allowing the extraction of valuable information
from unstructured text data.
In addition, questions about the product usage can be captured
(e.g., role of the user, age, experience with the product, frequency
of use), which provide further insights into a product portfolio. For
example, one could imagine a development team that improved a
workflow for a specific user role in their product. While the overall
score across all user roles might not have improved notably, looking
at the scores for the target user role might show an improvement of
the score some time after the release of the feature.
4E. Bertram et al.
2.4 Single vs. multiple surveys
In large software enterprises with a diverse product landscape made
of different UX maturity levels, additional challenges might arise.
For example, some product teams might already have started to use
surveys to collect end-user feedback, and these surveys might use dif-
ferent UX metrics than those described above. In such cases, teams
might refuse changing to the new set of metrics, and in fact, the
availability of data collected in the past is a valuable source of in-
formation since it offers a baseline against which the measurement
of new versions could be compared. Furthermore, it is important to
choose common metrics to be able to compare the UX quality of
all products with each other. To compromise, one could enhance the
existing surveys with the most important new metrics, while it must
be ensured that the new survey does not get too long. In our case,
we decided to add at least the two UX-Lite questions and the PSAT
question to all existing surveys to achieve some comparability.
In addition, special products may want to ask special product-
related questions to gain deeper insights. Such questions may be
necessary only for a certain point in time. For example, if a new
feature is delivered it might be helpful to capture the user satisfaction
containing this feature with a dedicated question in the survey. If some
conclusion could be reached from the answers, this question can be
removed again to shorten the survey. Thus, a successful approach
requires some openness for application specific needs. Technically,
this is often a challenge since a substantial number of different surveys
might be required that need to be consolidated for analyzing the
common metrics.
2.5 Coordinating a cross-product initiative
Depending on the size of the organization, one may have or need
smaller groups to organize such a large-scale measurement project.
Having a contact person, typically a UX designer, a product man-
ager, or a small research operations team for the specific product
area can help to channel the survey into the respective development
process. Having contacts for each product is helpful when change is
required, e.g., when you want to support auto-triggered in addition
to user-triggered surveys. If code changes are required, the contact
champions can then talk to their product teams about it. However,
note that even with the champions in place it can be a huge co-
ordination effort to involve multiple products. Keeping a record of
who agreed to what and when is crucial for tracking purposes of the
initiative.
3 GATHERING END-USER DATA
After deciding upon the right metrics to use, applying them to users
via dedicated surveys is a vital step in the whole measurement pro-
cess. In the following, we will sketch several facts that one should
consider during the data collection phase.
3.1 Data protection regulations
Before reaching out to users via online surveys, we advise to carefully
check existing data protection regulations. Although the feedback is
provided anonymously, capturing personal data in the realm of the
General Data Protection Regulation (GDPR) can in most cases be
avoided. However, one should ensure that specific information such
as, e.g., IP addresses, demographic information, or the company
name is not collected carelessly. A grey area certainly is an open text
field, into which a user can enter personal information. It is advisable
to add a disclaimer in the instruction of the survey or directly above
the comment field that instructs users to avoid this.
If one needs to collect data that is detailed enough to identify a
person, you must add a special data privacy statement to the survey
and ensure that the responses are stored only if the user agrees to this
(especially in the European Union). In addition, you need to define
a clear process that ensures that such personal information is deleted
if it is no longer needed or after a certain amount of time.
If the application is used by employees of corporate customers, you
should also check whether contractual agreements allows to send the
survey to these employees.
3.2 Ensuring an adequate feedback sample size
A challenge that might occur in the data collection phase are weak
feedback streams from the end-users, causing low-number statistics.
Nevertheless, there exist different channels that can be used to ask
for user feedback:
•Links or buttons directly in the UI of a product that open a feedback
form or a longer survey when a user clicks on these elements.
•Automatic triggers that ask users for feedback while they work with
the product. For example, such a trigger could launch a dialog after
the user worked five minutes in the application. The dialog asks the
user to provide feedback, and if the user agrees, a feedback form is
opened. It is also possible to trigger the request for feedback when
users complete a task or perform a special action.
•A link to a survey can be distributed via dedicated e-mail or social
media channels.
Each of these channels have advantages and disadvantages. For
example, feedback forms that are accessible in the UI (either launched
by a passive feedback button or an active automatic trigger) are easy
to implement and reach the users when they are working with the
product. Thus, the feedback is not provided in retrospective, but
rather gives a direct impression of how the product is perceived by
the customer. However, since users would like to work in the system
rather than spend their time answering questions, they tend to not
accept longer surveys in this situation (which is especially true for
the case where feedback is requested by automatic triggers). Hence,
such in-system feedback mechanisms should be short to avoid high
dropout rates. In contrast, feedback requests distributed over e-mail
or links via social media channels allow longer surveys with more
detailed questions.
Especially automatic triggered requests for feedback are often per-
ceived as disturbing or annoying. Thus, they negatively influence the
UX and may even force users to provide bad ratings, since they are
angry about the interruption.
Feedback mechanisms launched by a passive button or a link typ-
ically generate less feedback. They are clicked if users detect a prob-
lem or positive behavior of the user interface and want to report it to
the development teams. Hence, there may be a bias in the responses
towards very positive or very negative ratings.
Some products may have a poor response rate, because they have
fewer users or the users do not want to offer feedback. In this case
it can be required to collect feedback over all available channels, for
example, by an email campaign delivered by an account manager who
knows the customer personally or by a well-known thought leader, a
feedback button in the UI, or a request for feedback in social media
channels.
To address these issues, we recommend implementing different
feedback mechanisms for different products. All products contain a
From customer survey feedback to software improvements 5
Figure 5. Different biases introduced by our method to collect end-user
feedback.
feedback button that launches a short feedback form that contains
the PSAT and the UMUX-Lite questions, and a comment field. In
addition, e-mail campaigns can be used to collect more details for
important products. The survey launched from such an e-mail cam-
paign might contain questions about demographic variables, usage
behavior (e.g., experience, frequency of use, etc.), the four question-
naires PSAT, UX-Lite, UEQ-S, and NPS together with two comment
fields that ask for positive and negative feedback.
To take advantage of the benefits of all different questions, you
can also choose to have multiple surveys for a product, e.g., by using
a short survey with standardized questions plus an open text field,
where this survey is always available over a feedback button in the
system, and a longer survey with the standardized questions plus some
context question including product-specific questions, distributed via
e-mail campaigns.
3.3 Biases introduced by the different channels
In general, one needs to be cautious when comparing UX metrics
collected over different channels, since each method comes with an
associated bias (see Figure 5). For example,let’s assume that we place
a feedback button in the UI. Users that are unsatisfied or satisfied
concerning the product will click on this button more frequently than
users with a neutral opinion. If you actively ask for feedback, you may
motivate some users that otherwise might not have clicked the button,
while other users might feel disturbed by such requests, introducing
yet another bias. Hence, any feedback collected via online surveys
will always contain some bias, since none of the feedback channels
will deliver a representative sample of all user perceptions.
If you want to compare ratings obtained by different methods, you
should be careful as differences may be caused by the method used to
collect the data rather than true differences in the KPI. However, we
are often interested in the ratings of all end users, although in practice
we only have survey responses from a small user sample. In this case,
we want to take the sampled user information only and extrapolate
the results to the general user population, which is the set of all end-
users. Nevertheless, since different users have different opinions, the
results we receive are likely to change if we ask a different sample of
users.
4 APPLYING STATISTICAL METHODS
In the following, we describe how one can use well-known statistical
methods from inferential statistics to gain a deeper understanding of
the UX perceptions of the users.
4.1 Analyzing the data
As a first step, we use the survey data to calculate aggregated mea-
sures, such as mean values, standard errors, and proportions of the
KPIs outlined above. These descriptive measures allow us to gain
insights into how the users evaluate their experiences with the prod-
ucts. However, we are not only interested in how the respondents of
our surveys evaluate the UX of our products. Instead, we would rather
like to know how the overall user population is rating the UX. The
mean values we calculate from the sample may not be representative
of all users since it is possible that we only asked a very specific
subset of users by chance. If we would have asked any other sample
of users, our results might have looked different. Consequently, we
need to account for the uncertainty caused by asking only a sample
and not the entire user population.
4.2 Learning about the general user population
A simple example illustrates this problem. Consider a product with
1,000 end users in total and assume that the product creates a neutral
impression concerning UX. We would like to know how satisfied
these users are on average with the product. If we would have asked
all 1,000 users to rate their product satisfaction on a five-point scale,
we would see that the true average is 𝜇=3.
In reality, however, we do not have responses from all 1,000 end
users, but only from a small sample of, e.g., 50 randomly selected
users. Taking the responses from these 50 users, we might get an
average satisfaction rating of ¯𝑥1=2.76, calculated by 1
𝑛Í𝑥𝑖, where
𝑛is the total number of responses and 𝑖∈𝑛denotes an individual
response. While this value is fairly close to the true mean of 3, we
could have drawn a different sample of 50 users which would have
yielded different results. In fact, taking a second sample from all
1,000 users and calculating the average product satisfaction provides
a score of ¯𝑥2=3.24. In theory, we could have asked yet another
sample of users and most likely received again a different average
product satisfaction score.
By repeating this process of randomly sampling 50 users from
all 1,000 end users 100 times and recording the respective average
satisfaction score, we would get the results as shown in Figure 6.
Clearly, our assessment of the end users‘ product satisfaction differs
depending on the sample used. Each time we ask a different sample
of users, we get slightly different results. While the values fluctuate
around the true value of 3, we end up with values between 2.54
and 3.32 in some situations simply because we have asked a specific
sample of users by pure chance.
To account for this, we utilize different methods from inferential
statistics in order to use the responses from the sampled users to
learn about the general user population. With these techniques, we
can draw conclusions about the general user population and quantify
the uncertainty caused by asking only a small subset of users.
4.3 Confidence Intervals
Since the sample mean ¯𝑥may not equal the population mean 𝜇, we
calculate confidence intervals, indicating the range in which the true
population mean 𝜇lies with a pre-defined level of confidence. Using
6E. Bertram et al.
Figure 6. Drawing 100 samples of 50 users from a user population with 𝜇=3
and recording the respective sample means measuring ¯𝑥1to ¯𝑥100 . In this case,
the sample means range from ¯𝑥7=2.54 to ¯𝑥82 =3.32.
the sample mean ¯𝑥and the sample variance 𝑠2, confidence intervals
for the means are calculated by:
¯𝑥±𝑡(𝑑𝑓 , 𝛼/2) × 𝑠2
𝑛,(1)
where 𝑛represents the number of survey responses and 𝑡(𝑑𝑓 , 𝛼/2)
corresponds to the critical value derived from a two-tailed 𝑡-
distribution, with 𝛼being the desired threshold for statistical sig-
nificance and 𝑑𝑓 representing the degrees of freedom given by
𝑑𝑓 =𝑛−1.3Since we stick to the convention of reporting 95%
confidence intervals, we use 𝛼=0.05. The sample variance is given
by:
𝑠2=1
𝑛−1
𝑛
𝑖=1
(𝑥𝑖−¯𝑥)2.(2)
Equation 1defines the lower and upper bound of the 95% con-
fidence interval which is symmetric around the point estimate ¯𝑥by
design. The formula also shows that the widths of confidence inter-
vals is determined by two factors: the number of responses 𝑛and
the variance 𝑠2. The more responses we have, the narrower the confi-
dence interval and the more precise our estimate of 𝜇. Conversely, the
larger the variability in survey responses, the wider the confidence
interval.
Constructing confidence intervals around proportions (e.g., for the
PSAT score) requires a slightly adjusted formula. Specifically, we use
3The 𝑡-distribution is a generalization of the normal distribution. While it is
also a continuous and symmetric probability distribution, it has more mass at
its tails if 𝑛is finite. This makes the results of statistical tests more conservative
as the critical value a test statistic needs to reach to be statistically significant
is higher than the critical value derived from the standard normal distribution.
As 𝑛→ ∞, the 𝑡-distribution approaches the normal distribution.
a standard normal distribution rather than the 𝑡-distribution to derive
the critical value and replace the sample standard deviation 𝑠2by
ˆ𝑝(1−ˆ𝑝), where ˆ𝑝is the estimated proportion from the sample:
ˆ𝑝±𝑧𝛼/2׈𝑝(1−ˆ𝑝)
𝑛.(3)
Again, the more responses we receive, the narrower the confidence
intervals. Furthermore, confidence intervals become wider if ˆ𝑝is
either close to 0or 1.
While confidence intervals provide important information on the
range in which the true value 𝜇lies (given a specific level of con-
fidence) if we would have asked all users, many business use cases
require the comparison of KPIs across products or over time. For
example, we may be interested in knowing whether the PSAT score
of a product increases from one quarter to the next. Alternatively, we
may also be interested in knowing whether users rate the usefulness
of one product higher as compared to another product.
4.4 Hypothesis Testing
Confidence intervals have only limited utility in answering these
questions. The reason is that the overlap in confidence intervals is not
sufficient to gauge whether or not differences are caused by chance
alone. Even if confidence intervals do overlap, their difference may
still be statistically significant (see, e.g., Schenker and Gentleman
2001).
To overcome this limitation, we utilize additional techniques from
inferential statistics for hypothesis testing. Specifically, we want to
know whether a KPI measured for two groups differ in the user
population by asking a small subset of users. Denoting the true
population parameters (the value we would obtain if we would have
asked all users) of a KPI by 𝜇1and 𝜇2, we test the null hypothesis of
no difference in means. The corresponding alternative hypothesis is
undirected and states that both population parameters differ:
𝐻0:𝜇1=𝜇2
𝐻1:𝜇1≠𝜇2
For the difference in means, we perform a two-tailed independent
samples Welch’s 𝑡-test to evaluate the null hypothesis. In contrast to
the Student’s 𝑡-test, this method has a superior performance if both
groups have unequal variances and their sample sizes differ as it does
not use a pooled variance estimate but rather a combination of the
group variances (Welch 1947).
We first use the groups’ sample means ¯𝑥1and ¯𝑥2and their sample
variances 𝑠2
1and 𝑠2
2as shown in Equation 2to calculate the test
statistic:
𝑡=¯𝑥1−¯𝑥2
𝑠2
𝑥1
𝑛1+𝑠2
𝑥2
𝑛2
.(4)
Equation 4illustrates that, all else equal, 𝑡becomes larger as
i) the difference in observed means increases, ii) the variances in
the samples decreases, and iii) the number of responses per group
increases.
Next, we determine the critical value the test statistic 𝑡needs to
exceed in order to be statistically significant. To this end, we again set
the threshold for statistical significance to 𝛼=0.05 and approximate
the degrees of freedom by the Welch-Satterthwaite equation (e.g.,
Satterthwaite 1946,1941):
From customer survey feedback to software improvements 7
𝑑𝑓 =𝑠2
𝑥1
𝑛1+𝑠2
𝑥2
𝑛22
𝑠4
𝑥1
𝑛2
1(𝑛1−1)+𝑠4
𝑥2
𝑛2
2(𝑛2−1).(5)
Using 𝛼and 𝑑𝑓 , we derive the critical value 𝑡(𝑑𝑓 , 𝛼/2)from a 𝑡-
distribution. If 𝑡 > 𝑡 (𝑑 𝑓 , 𝛼/2), the difference we find in the sample is
statistically significant. Consequently, we can reject 𝐻0and conclude
that the true population parameters 𝜇1and 𝜇2differ based on the
samples we used and the pre-defined confidence level.
Conversely, if 𝑡≤𝑡(𝑑𝑓 , 𝛼/2), we cannot conclude that the differ-
ence we find between ¯𝑥1and ¯𝑥2also exists in the user population.
That is, there might be a true difference in 𝜇1and 𝜇2but given the
data we have, we cannot be confident enough that the difference we
find in the sample is also present in the overall user population.
To test the statistical significance of a difference in proportions
(e.g., for the PSAT score), we follow the same basic steps. However,
instead of Equation 4, we use the observed proportions from the
samples ˆ𝑝1and ˆ𝑝1and the following formula to derive the test
statistic:
𝑧=ˆ𝑝1−ˆ𝑝2
ˆ𝑝1(1−ˆ𝑝1)
𝑛1+ˆ𝑝2(1−ˆ𝑝2)
𝑛2
.(6)
Furthermore, instead of using a 𝑡-distribution, we derive the crit-
ical value 𝑧𝛼/2from a standard normal distribution using the same
threshold for statistical significance of 𝛼=0.05.
Again, if 𝑧 > 𝑧𝛼/2, the difference in the proportions is statistically
significant and our analysis indicates that there is a true difference in
the user population. If 𝑧≤𝑧𝛼/2, we cannot rule out the possibility
that any difference we find in the samples are caused by chance alone
and that there is no difference in the user population.
While identifying statistically significant changes over time or
across products is crucial for data-driven decision-making in a busi-
ness context, it is important to note that statistical significance does
not imply substantive significance. A very minor change in a KPI
might be statistically significant (e.g., because the sample size is
huge or there is little variation among survey responses), but of such
a small magnitude that it is substantively negligible. These statistical
hypothesis tests provide pivotal information in order to avoid basing
decisions on random fluctuations of sampled data but they should
not be viewed in isolation from the relevant business context.
4.5 Analysing Comments with GenAI
Large language models (LLM) are powerful tools to analyze user
comments towards a product efficiently. The release of ChatGPT
revolutionized (see e.g., Cao et al. 2023;Bandi et al. 2023) natural
language processing. GenAI tools, like ChatGPT, can be applied in
different tasks to enhance productivity. The aspect of GenAI that is
especially interesting for our purposes is that it allows to analyze large
amounts of user comments automatically. Without this capability
it is a quite time consuming and difficult task if the number of
comments exceeds a certain limit. And this limit is easily reached if
data are collected by feedback buttons in heavily used professional
applications.
We use natural language processing techniques in the moment
mainly for two purposes. Comments concerning an application are
automatically categorized into topics by a model that is trained on
already existing human classifications of comments. The topics are
then available in the dashboard to search and filter the list of com-
ments collected for an application. This helps, for example, product
owners or developers to quickly analyze the comments concerning
their application and to derive insights concerning frequently men-
tioned UX problems or ideas for improvements.
Of course, the content of the collected comments must also be
communicated to the interested stakeholders. And in this case we can
typically not expect that they read and analyze the single comments.
Of course, managers of a area are interested to check what users say,
but can not go through 100 comments per application in their area of
responsibility. Thus, ChatGPT is used to create summaries from the
comments. These summaries are contained in the quarterly reports
and are also shown in a special area of the dashboard.
All comments for a quarter are downloaded to a list of comments.
On top of the list an instruction (see example below) is added that
tells what should be done and the file is then handed as a prompt
to ChatGPT. The resulting summary is then quickly checked by an
UX expert. Such a manual final check is recommendable. It does not
require to much effort, but makes sure that the result really covers the
content of the comments. Especially if the number of comments per
quarter and product is low, there is a risk that single comments are
overrepresented in the summary. We have also established a minimal
number of comments that must be available before a summary is
created. This is especially important if a new product is onboarded
and only a small number of comments is available for the current
quarter.
5 MAKING THE DATA AND THE ANALYSES
TRANSPARENT
Typically, the UX of a product cannot be improved by a single person
alone. Instead, the whole team needs to be accountable to increase
the product satisfaction. Thus, we recommend that everyone in a
company gets access to the data and the analyses to help influence
the UX, from executives to individual contributors.
5.1 Data storytelling using a dashboard and a report
To achieve this goal, one could create a dashboard (e.g., by using a
tool like SAP Analytics Cloud) that displays all the KPIs and auto-
matically creates the relevant statistical measures described above. In
addition, one could also create a UX measurement report and make
it accessible for everybody in the company. A dashboard can be great
for deeper analysis to look for areas to research, and if it supports
filtering, showing historical trends of the KPIs or context information
gathered in the surveys it can provide huge value for employees.
Figure 7and Figure 8illustrate how a UX dashboard can look like
in practice, using imaginary product names and mock data. While
Figure 7showcases an example screen for the PSAT score, Figure 8
provides information on the UEQ score. The example UX dashboard
shown here has a header with the option to switch between different
KPIs. Business users interested in, e.g., the Net Promoter Score can
easily navigate to this KPI by clicking on the respective button in
the header and find more detailed information there. In the product
satisfaction screen, the stacked bar chart displays the distribution of
users grouped by their level of satisfaction across time. If the PSAT
score drops, for example, this chart allows users to see whether
previously satisfied users become neutral or rather dissatisfied users.
The other three charts provide information on the share of satisfied
users across different levels of aggregation. The example shows that
8E. Bertram et al.
users who frequently interact with the products are less satisfied than
users who use the products just occasionally.
For the UEQ example screen shown in Figure 8, the evolution
of some scores over time is shown as three line charts, allowing
the users to look at the overall UX ratings as well as the users’
evaluations of the products’ pragmatic and hedonic qualities. Again,
the example screen also features a product-level split alongside the
split by frequency of use.
In addition to the information provided by the different charts,
there are several filtering options available in the header that allow
the user to take a closer look at a specific subset of the data. For
example, consider the scenario where you are working on a product,
while you want to learn more about its perception on the customer
side. In the dashboard, you could filter for this specific product and
recognize that the customer satisfaction is down since last quarter,
but upon further inspection, you observe that customer satisfaction
is simply recovering from a previous spike in the score. You could
slice-and-dice to see what specific user roles say about the product.
For example, you might find that administrators like the product,
while DevOps colleagues are not as enthusiastic. Additionally, you
might find an interesting pattern where people who are new to the
product are not happy, yet the longer people have been using it, the
more they like it. This could be explained as new users struggle to
learn the product and how to make use of it, but after they mastered
it, their satisfaction increases. You might also see that one company
contributed many responses, and all scores from this company lie
below the average value. This could happen when a company en-
courages their users to respond to the survey to have their thoughts
heard. Reading through the free text-field comments can help you to
confirm this. Such an analysis could trigger many ideas for further
research.
Moreover, a report can be helpful as well for those who just re-
quested a status update or snapshot of what the users think of the
products. A report is generally easy to consume, particularly if it
draws attention to important information. For instance, the report
could highlight significant changes since the previous report. It could
also provide a product summary page with the KPIs for each product
including a sampling of the comments from the surveys. One may
decide whether to host the reports in an internal broadly accessible
location or to also send out the report once released to a distribution
list.
5.2 Fostering adoption through appropriate data visualization
To ensure that the stakeholders take away the appropriate insights, the
dashboard and reports must be easy to consume. The International
Business Communications Standards (Hichert and Faisst 2019), also
called IBCS, emphasizes standard charting notation and minimizes
unnecessary colors except when semantically relevant to help your
reports and dashboards being easier to understand. If you are using
SAP Analytics Cloud, you could also follow the 10 golden rules of
dashboard design as described in Bertram et al. (2021). Furthermore,
one needs to ensure that the dashboard does not suffer major perfor-
mance issues, which might lead to a low adoption of the application
(Bertram et al. 2024).
To guarantee that many product teams use these resources to im-
prove the products, the adoption needs to be high. Change manage-
ment theory suggests that you should involve the stakeholders early
so that you can hear their concerns and pivot appropriately (Yeoh
et al. 2008). You could take early designs of the dashboard to the
product teams and request a usability testing of the dashboard to hear
about the concerns, and if you hear that some UX teams fear that they
will be judged based on the outcome, you might convey this concern
to the executive sponsors to ensure that they indicate that no teams
will be judged, and that the entire product team is required to make
improvements. In addition, you could also organize onboarding ses-
sions explaining how to slice and dice the data and how to interpret
the statistics.
Another concern might be that the product adoption could drop
when some UX improvements will not result in immediate score im-
provements. In this case, one could include in the roll-out messaging
that people should trust that the UX is important and that eventually
the scores will rise with enough effort and iterations.
To ensure all members of the product teams are aware of the dash-
board and reports, the exact location of these resources should be
widely evangelized, e.g., by referring to them in portals, meetings,
internal conference, newsletters, etc. Clearly, they should be accessi-
ble to everyone without additional overhead of granting access rights.
6 DRIVING CHANGE WITH INSIGHTS
Ultimately, the UX KPIs are measured to drive improvements in the
products. Even though these are UX KPIs, the UX team alone should
neither be seen as the enforcer, nor as being solely accountable for
any product improvements.
To achieve the necessary level of product experience improve-
ments, there are more steps along the way, while gathering UX KPIs
are merely the first step. The product teams should set target KPI
values, and periodically review the current state. If the targets are not
reached, the product teams should decide on what actions to take.
Gathering survey responses typically will not indicate how to change
the product, but merely points out that change is needed, and the
comments may provide an area to research. Finally, the teams must
agree upon which topics to research next, who should be involved,
which insights would deserve new design improvements, and how
to prioritize future backlog items to be implemented. If any step
is halted, then the improvements may not be achieved. These steps
can be owned by different departments or roles, yet there should be
alignment among them.
Different roles may have different viewpoints. For example, the UX
team might want the UX improvements to be implemented, while the
sales team wants new features to be added, and the dev team wants
to support a new technology. All viewpoints can be valid and should
be balanced. If the UX KPIs are reasonably high, then the team can
agree to reduce focus on the UX improvements and instead to put
focus on other areas of the product. However, if the UX KPIs are
quite low, the product team should agree on what UX KPI target to
hit and in which timeframe, and then support activities to reach this
goal.
There needs to exist trust amongst the team members where all
have the same goal to improve the product, even though different
team members may have different ideas about how to get there. Team
members should be open minded to see the perspectives of others. If
the product team jointly sets the UX goal, they jointly control how
quickly and how much effort they are willing to invest.
After the improvements are implemented and released, further
survey responses will assess these improvements. Note the scores
may take time to rise because fixing a single issue will not cause a
drastic change in scores. It is only through consistent and sustained
improvements where the users will start rating the products higher.
From customer survey feedback to software improvements 9
Figure 7. This UX prototype dashboard is an example screen showing the product satisfaction scores and their 95% confidence intervals for six products over
time. The charts at the bottom show the scores sorted by customer and frequency of use. By combining different filters at the top one can try to get more
information out of the survey data and analyze different use cases.
Figure 8. Another UX prototype dashboard showing the UEQ scores and their 95% confidence intervals for six products over time. It also shows the split
between the products’ pragmatic and hedonic qualities. Again, by combining different filter settings at the top one can try to extract additional information out
of the data set.
10 E. Bertram et al.
7 OUTLOOK
A best-case scenario for gathering UX feedback with the help of
surveys might look like this: UX KPIs are well established and ac-
cepted in a company, from the board-level to the product teams. This
is because UX KPIs and the improvement of UX KPIs have proven
to correlate with other business relevant KPIs, such as contract re-
newals, or monthly active users, and product teams notice that their
joint efforts of investing in UX projects lead to increased UX KPIs.
Executives understand that UX KPIs often require more than just a
few quarters to improve. Product teams understand that improving
the UX is a joint responsibility, involving all disciplines. Improving
the UX can concern various aspects: It may range from investing
in improving the stability, reliability, or performance of a product,
to harmonizing workflows across different areas of a product or to
integrating innovative features, to name just a few examples. Prod-
uct teams use the open text feedback to identify areas of improve-
ment, and apply different methods to fill knowledge gaps, qualitative
user research being one of them. They also iterate constantly with
users when designing new workflows or features. Users continuously
provide feedback because they notice that the company takes the
feedback seriously and invests in projects to improve the UX.
We believe there are a few factors that increase the chance to get to
such a best-case scenario, many of them summarizing factors named
above:
•Ensure high-level executive buy-in for the UX KPIs early on.
•Include the UX KPIs in the general company KPIs and KPI setting
process.
•Involve stakeholders on all levels early on in setting up the KPIs, the
process, and your assets.
•Establish a network of champions across different products that help
to implement and evangelize your UX KPI project.
•Make the data from the UX surveys transparent and accessible for
everyone with the help of dashboards and reports.
•Provide support and enablement for teams to make use of the data.
•Analyze the relationship between UX KPIs and business relevant
KPIs and make them transparent.
•Analyze the impact of UX improvement initiatives on UX KPIs and
celebrate if they were successful.
•Continuously gather feedback from your stakeholders and improve
your initiative.
Every company is different and may require different approaches
and focus areas. But the end-to-end process described in this article
has worked for us so far and comprises generic factors that should
apply and work for software companies in general. Gathering UX
feedback from users is a key prerequisite to improving the UX, and
eventually, further business relevant KPIs.
Moving forward, the recent advancements in the field of generative
AI, especially large language models, provide enormous opportuni-
ties to derive further insights and automate the interpretation of user
feedback in the form of comments. Creating summaries of com-
ments automatically, for example, greatly facilitates the processing
of text data. Additionally, the categorization of comments as well
as the analysis of sentiments can help to generate further insights
with relatively low effort. In combination with the other UX metrics,
the automated analysis of user comments provide a comprehensive
overviewof how users perceive the UX of a product. Leveraging these
insights facilitate the effective planning of product improvements.
ACKNOWLEDGEMENTS
The authors gratefully acknowledge the support of Viola Stiebritz
on the graphical design of some figures in this publication. We also
thank Michael Ameling for useful comments on the text.
REFERENCES
Ajay Bandi, Pydi Venkata Satya Ramesh Adapa, and Yudu Eswar Vinay
Pratap Kumar Kuchi. 2023. The power of generative ai: A review of re-
quirements, models, input–output formats, evaluation metrics, and chal-
lenges. Future Internet 15, 8 (2023), 260.
Kent Bauer. 2004. KPIs-The metrics that drive performance management.
Information Management 14, 9 (2004), 63.
E. Bertram, J. Charlton, N. Hollender, M. Holzapfel, N. Licht, and C. Padu-
raru. 2021. Designing Dashboards with SAP Analytics Cloud. Rhein-
werk Publishing, Incorporated. https://books.google.de/books?
id=p69tzgEACAAJ
E. Bertram, C. Dannenhauer, M. Holzapfel, S. Loop, and S. Range. 2024.
SAP Analytics Cloud Performance Optimization Guide. SAP PRESS.
https://books.google.de/books?id=D1djzwEACAAJ
Rune Todnem By. 2005. Organisational change management: A critical re-
view. Journal of change management 5, 4 (2005), 369–380.
Yihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S Yu, and
Lichao Sun. 2023. A comprehensive survey of ai-generated content
(aigc): A history of generative ai from gan to chatgpt. arXiv preprint
arXiv:2303.04226 (2023).
M. Dalrymple, S. Pickover, and B. Sheppard. 2021. Made to measure: Get-
ting design leadership metrics right. https://www.mckinsey.
com/capabilities/mckinsey-design/our- insights/
made-to- measure-getting- design-leadership-metrics- right
F.D. Davis. 1989. Perceived Usefulness, Perceived Ease of Use, and User
Acceptance of Information Technology. MIS Quarterly 13, 3 (1989),
319–340.
K. Finstadt. 2010. The Usability Metric for User Experience. Interacting
with Computers 22, 5 (2010), 323–327.
Nicholas I Fisher and Raymond E Kordupleski. 2019. Good and bad market
research: A critical review of Net Promoter Score. Applied Stochastic
Models in Business and Industry 35, 1 (2019), 138–151.
Jeff Hiatt and Timothy J Creasey. 2003. Change management: The people
side of change. Prosci.
R. Hichert and J. Faisst. 2019. Gefüllt, gerahmt, schraffiert: Wie visuelle
Einheitlichkeit die Kommunikation mit Berichten, Präsentationen und
Dashboards verbessert. Vahlen. https://books.google.de/books?
id=eKmWDwAAQBAJ
A. Hinderks, D. Winter, M. Schrepp, and J. Thomaschewski. 2021. Appli-
cability of User Experience and Usability Questionnaires. Journal of
Universal Computer Science 25, 13 (2021), 1717–1735.
ISO. 2019. Ergonomics of human-system interaction – Part 210: Human-
centred design for interactive systems. Standard. International Organiza-
tion for Standardization, Geneva, CH.
William J Kettinger and Varun Grover. 1995. Toward a theory of business pro-
cess change management. Journal of management information systems
12, 1 (1995), 9–30.
Urška Lah, James R. Lewis, and Boštjan Šumak. 2020. Perceived Usability
and the Modified Technology Acceptance Model. International Journal
of Human-Computer Interaction 36, 13 (2020), 1216–1230. https:
//doi.org/10.1080/10447318.2020.1727262
Bettina Laugwitz, Theo Held, and Martin Schrepp. 2008. Construction and
evaluation of a user experience questionnaire.. In Symposium of the Aus-
trian HCI and usability engineering group. Springer, 63–76.
Bettina Laugwitz, Martin Schrepp, and Theo Held. 2006. Konstruktion eines
Fragebogens zur Messung der User Experience von Softwareprodukten..
In Mensch & Computer. 125–134.
J.R. Lewis and J. Sauro. 2018. Item benchmarks for the System Usability
Scale. Journal of Usability Studies 13, 3 (2018), 158–167.
J.R. Lewis and J. Sauro. 2021. Measuring UX: From the
From customer survey feedback to software improvements 11
UMUX-Lite to the UX-Lite. https://measuringu.com/
from-umux- lite-to- ux-lite(retrievedMay2022)
James R Lewis, Brian S Utesch, and Deborah E Maher. 2013. UMUX-
LITE: when there’s no time for the SUS. In Proceedings of the SIGCHI
conference on human factors in computing systems. 2099–2102.
Anthony Mento, Raymond Jones, and Walter Dirndorfer. 2002. A change
management process: Grounded in both theory and practice. Journal of
change management 3, 1 (2002), 45–59.
Neil A Morgan and Lopo Leotte Rego. 2006. The value of different customer
satisfaction and loyalty metrics in predicting business performance. Mar-
keting science 25, 5 (2006), 426–439.
F. F. Reichheld. 2003. The one number you need to grow. Harvard business
review 81, 12 (2003), 46–55.
V. F. Ridgway. 1956. Dysfunctional Consequences of Performance Mea-
surements. Administrative Science Quarterly 1, 2 (1956), 240–247.
http://www.jstor.org/stable/2390989
Jim Ross. 2014. The business value of user experience. Cranbury: D3
Infragistics (2014).
Franklin E. Satterthwaite. 1941. Synthesis of Variance. Psychometrika 6, 5
(1941), 309–316. https://doi.org/10.1007/BF02288586
F. E. Satterthwaite. 1946. An Approximate Distribution of Estimates of
Variance Components. Biometrics Bulletin 2, 6 (1946), 110–114. http:
//www.jstor.org/stable/3002019
J. Sauro. 2016. The Challenges and Opportunities of Measuring the User
Experience. Journal of Usability Studies 12, 1 (2016), 1–7.
Nathaniel Schenkerand Jane F. Gentleman. 2001. On Judging the Significance
of Differences by Examining the Overlap Between Confidence Intervals.
The American Statistician 55, 3 (2001), 182–186. http://www.jstor.
org/stable/2685796
Larissa Schöllhorn, Maria Lusky, Denis Villmen, and Dinu Manns. 2019.
UX-Monitoring für Unternehmen-Wie ein Tool zum Langzeitmonitoring
von UX neue Mehrwerte in Unternehmen schafft. Mensch und Computer
2019-Usability Professionals (2019).
M. Schrepp. 2021. User Experience Questionnaires: How to use question-
naires to measure the user experience of your products? Kindle Direct
Publishing, ISBN-13: 979-8736459766.
M. Schrepp, A. Hinderks, and J. Thomaschewski.2017. Design and evaluation
of a short version of the user experience questionnaire (UEQ-S). Inter-
national Journal of Interactive Multimedia and Artificial Intelligence 4,
6 (2017), 103–108.
Martin Schrepp and Jörg Thomaschewski. 2019. Eine modulare Erweiterung
des User Experience Questionnaire. Mensch und Computer 2019-
Usability Professionals (2019).
Faith Shimba. 2010. Cloud computing: Strategies for cloud computing adop-
tion. Technological University Dublin.
David Sward and Gavin Macarthur. 2007. Making user experience a business
strategy. In E. Law et al.(eds.), Proceedings of the Workshop on Towards
a UX Manifesto, Vol. 3. Citeseer, 35–40.
Amy Van Looy and Aygun Shafagatova. 2016. Business process performance
measurement: a structured literature review of indicators, measures and
metrics. SpringerPlus 5, 1 (2016), 1–24.
Bernard Lewis Welch. 1947. The Generalization of ’Student’s’Problem When
Several Different Population Variances are Involved. Biometrika 34, 1-2
(01 1947), 28–35. https://doi.org/10.1093/biomet/34.1- 2.28
William Yeoh, Andy Koronios, and Jing Gao. 2008. Managing the implemen-
tation of business intelligence systems: a critical success factors frame-
work. International Journal of Enterprise Information Systems (IJEIS)
4, 3 (2008), 79–94.