PreprintPDF Available

Personalised Recommendations in Mental Health Apps: The Impact of Autonomy and Data Sharing

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The recent growth of digital interventions for mental well-being prompts a call-to-arms to explore the delivery of personalised recommendations from a user's perspective. In a randomised placebo study with a two-way factorial design, we analysed the difference between an autonomous user experience as opposed to personalised guidance, with respect to both users' preference and their actual usage of a mental well-being app. Furthermore, we explored users' preference in sharing their data for receiving personalised recommendations, by juxtaposing questionnaires and mobile sensor data. Interestingly, self-reported results indicate the preference for personalised guidance, whereas behavioural data suggests that a blend of autonomous choice and recommended activities results in higher engagement. Additionally, although users reported a strong preference of filling out questionnaires instead of sharing their mobile data, the data source did not have any impact on the actual app use. We discuss the implications of our findings and provide takeaways for designers of mental well-being applications.
Content may be subject to copyright.
Personalised Recommendations in Mental Health Apps: The
Impact of Autonomy and Data Sharing
Svenja Pieritz
Telefonica Alpha, Spain
Mohammed Khwaja∗†
Imperial College London, UK
A. Aldo Faisal
Imperial College London, UK
Aleksandar Matic
Koa Health, Spain
The recent growth of digital interventions for mental well-being
prompts a call-to-arms to explore the delivery of personalised rec-
ommendations from a user’s perspective. In a randomised placebo
study with a two-way factorial design, we analysed the dierence
between an autonomous user experience as opposed to person-
alised guidance, with respect to both users’ preference and their
actual usage of a mental well-being app. Furthermore, we explored
users’ preference in sharing their data for receiving personalised
recommendations, by juxtaposing questionnaires and mobile sensor
data. Interestingly, self-reported results indicate the preference for
personalised guidance, whereas behavioural data suggests that a
blend of autonomous choice and recommended activities results in
higher engagement. Additionally, although users reported a strong
preference of lling out questionnaires instead of sharing their
mobile data, the data source did not have any impact on the actual
app use. We discuss the implications of our ndings and provide
takeaways for designers of mental well-being applications.
Human-centered computing Human computer interac-
tion (HCI);Empirical studies in ubiquitous and mobile computing.
User Perception; Personalisation; Recommender Systems; Personal-
ity Traits
ACM Reference Format:
Svenja Pieritz, Mohammed Khwaja, A. Aldo Faisal, and Aleksandar Matic.
2021. Personalised Recommendations in Mental Health Apps: The Impact
of Autonomy and Data Sharing. In CHI Conference on Human Factors in
Both authors contributed equally to this work
The author is also a part of Koa Health, Spain
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from
CHI ’21, May 8–13, 2021, Yokohama, Japan
©2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8096-6/21/05. . . $15.00
Computing Systems (CHI ’21), May 8–13, 2021, Yokohama, Japan. ACM, New
York, NY, USA, 12 pages.
Digital mental well-being interventions present the promise to mit-
igate the global shortage of mental healthcare professionals in a
cost-eective and scalable manner [
]. Their emergence has been
accelerated by the experiences of the COVID-19 pandemic [
] and
its impact on mental health [
]. The growth of the digital mental
health space has been paralleled by the rapid increase in research
and development of new interventions and content. Benets of
rich content are indisputable, yet a vast amount of choices can also
misre—which is known as a paradox of choice [
]. For this rea-
son, we are witnessing the advent of recommender systems also in
digital mental health platforms [
]. Although personalised recom-
mendations represent an important aid with respect to the choice
overload and moreover in improving the intervention eectiveness,
delivering those recommendations entails two main challenges.
Firstly, how to balance autonomy and personalised guidance has
become an important topic in the design of personalised technolo-
gies. Secondly, data sharing concerns are undetachable from the
automatic personalisation models. Both challenges have a very
specic relevance when it comes to digital health applications [2].
While users’ autonomy is one of the common principles in
designing digital experiences [
], patients in traditional doctor-
patient settings typically expect (and often prefer) to “be told what
to do” rather than to “do what they want”. This raises an ethical ten-
sion between ensuring the safety of patients and respecting their
right to autonomy [
]. In addition, data privacy, like autonomy,
is another central theme in personalised technologies—especially
for digital services that rely on behavioural signals and sensitive
mental health data to personalise interventions. There are a myriad
of associated challenges including unintended data leakages, lack
of users’ technical literacy, the need of nding an appropriate bal-
ance between using less privacy-invasive monitoring and providing
more tailored interventions to improve health outcomes, and so on.
We empirically investigate the multifaceted challenges of au-
tonomy and data sharing in mental health applications from the
point-of-view of users. The importance of understanding the user’s
perspective stems from the fact that user disengagement represents
one of the key challenges towards an improved eectiveness of
digital mental health interventions [
]. Similar to phar-
macological therapies, no matter how personalised and ecient
arXiv:2101.08375v1 [cs.HC] 21 Jan 2021
CHI ’21, May 8–13, 2021, Yokohama, Japan Pieritz and Khwaja et al
a digital intervention is, a user’s adherence is a pre-requisite to
receive the desired benets. As the content in mental health ap-
plications is growing, we are likely at the dawn of expansion of
personalised recommender systems [
]. Therefore, the question
on how to design the user experience of delivering personalised
recommendations deserves an important place in Human Computer
Interaction (HCI) research. Our objective is to inform digital user
experience designers on how to best promote users’ engagement
when providing diverse digital mental health content. To this end,
we explore users’ declared preference as well as their actual app
usage with respect to: 1) a primarily autonomous versus a primarily
guided user experience, 2) data to be shared in order to receive
recommendations. Specically, we address the following research
Do users prefer an autonomous or guided experience in a mental
health app?
Does receiving an autonomous versus guided experience impact
the actual app use?
To power a recommendation system, do users prefer to share
smartphone data or to self-report their personality traits?
Does sharing smartphone data as opposed to self-reporting
personality traits inuence the actual app use?
We used a commercially available mental health application that
includes more than 100 activities (i.e. interventions) and delivered
it to
participants. In a two-factor factorial design exper-
iment, we randomly assigned half of the participants to a guided
user experience and the other half to an autonomous selection of
mental well-being app content. Independently, we assigned half of
the participants to a self-reported way of capturing a user model
and half to a consent form for sharing smartphone data–that could
be used to infer the same user model. We used the Big Five per-
sonality traits [
] as a user model, as personality has been
widely used to personalise digital health solutions [
] and they
can be inferred passively with smartphone sensing data [
The participants were primed that they will receive personalised
recommendations that are based on the data that they agreed to
share. However, in reality, the recommendations were random. We
opted for a random placebo experimental design, based on priming,
to reduce the dependency on the recommendation system accuracy
that may not always be uniform for all users, thus representing a
confounding factor. Having four randomised groups allowed us to
delve into the relative dierences in the actual app usage and users’
declared preferences as a function of the two factors—autonomy
and data sharing.
The choices put forth to mental health intervention designers
are not trivial, especially in light of ethical tensions related to pa-
ternalistic design choices [
] or the possible risks arising from
increasingly sensitive data streams. Yet, both design choices are
important to tackle in order to unlock the value of personalised
technology [18]. This paper deepens understanding of users’ pref-
erence and their actual app usage as a consequence of the app
design choices, and contributes to the related debates in the HCI
community and beyond.
Blom and Monk dened personalisation as "a process that increases
personal relevance" [
]. Personalisation has gained signicant at-
tention in digital services, since providing targeted user experience
has been shown to increase acceptance [
]. Particularly in health
applications, personalisation was shown to increase not only en-
gagement but also eectiveness and ultimately well-being. Noar et
al [
] conducted a meta-analysis of 57 studies that used tailored
messages to deliver health behaviour change interventions (for
smoking cessation, diet, cancer screening, etc.), and concluded that
personalised health interventions are more eective than generic
ones. Zanker et al [
] argued that personalisation can impact a
range of outcomes including user engagement, app behaviours,
and adoption rates. Recent studies have also found that person-
alisation of digital health apps can signicantly improve health
outcomes [
], however, the manner in which personalisation
is delivered to the users and how they perceived it can be even
more important than the extent to which a service is really person-
alised [
]. Our work builds on the previous literature by further
exploring the topic of delivering personalised recommendations in
digital mental health from the users’ perspective. We explored both
users’ preference as well as how their engagement with the app
are impacted by a) dierent ways of providing personalised recom-
mendation—by giving users more or less autonomy in choosing the
app content, and b) dierent ways of sharing the data required for
delivering personalisation. Our study highlights the importance of
autonomy and data privacy in the design of digital mental health
services and provides key takeaways for user experience design.
2.1 Autonomy
Autonomy has been an important focus in HCI, and specically
in persuasive technologies. Rughinis et al. [
] decoupled ve di-
mensions of autonomy in the context of health and well-being
apps including: (1) degree of control that the user has; (2) degree of
functional personalisation; (3) degree of truthfulness and reliabil-
ity of the information in the app; (4) users’ understanding of the
goal-pursuit and (5) promotion of moral values by what the app
recommends. Embedding autonomy in the design of digital services
impacts not only motivation and user experience but also psycholog-
ical well-being. For this reason, Peters et al [
] included autonomy
as one of the three key principles in “designing for well-being” (in
addition to competence and relatedness), using Self Determination
Theory [
] as the basis for their approach. For instance, game
designers have long explored the concept of autonomy and showed
that the perceived autonomy in video games contributes to game
enjoyment and also short-term well-being [
]. While autonomy
leads to improved well-being and engagement (in addition to being
ethically recommended [
]), providing a range of choices may act
as a demotivating factor [
]. Besides, providing more guidance
with tailored interventions can lead to improved eectiveness of
the intervention. Hence, designers of personalised applications face
conicting requirements.
In this study, we set to explore how the degree of autonomy im-
pacts the users’ subjective preference, as well as their engagement
with a mental health application.
Personalised Recommendations in Mental Health Apps CHI ’21, May 8–13, 2021, Yokohama, Japan
2.2 Data Privacy
Data privacy and related topics—including but not limited to trans-
parency, control and data governance—have been extensively dis-
cussed over the past decade due to rapid technological expansion.
These topics gained even more prominence after the introduction
of the EU’s General Data Protection Regulation (GDPR) [
]. The
HCI community has promptly focused their eorts on understand-
ing how these topics may impact interaction with digital services.
Providing personalised recommendations typically relies on using
sensitive information streams and past studies indicate that users’
attitude towards sharing potentially sensitive data was shown to
be very conservative [
]. For mobile health apps specically, Peng
et al [
] conducted six focus groups and ve individual interviews
with 44 participants to compare what users value the most in these
kinds of apps. While participants valued the benets of personali-
sation, the authors found that they were strongly hesitant to share
personal information for receiving these benets. In another study,
HCI researchers conducted a “Wizard of Oz” study to investigate
whether the benets of receiving highly personalised services—Ads
in particular—osets concerns related to sharing personal data [
Interestingly, the study showed that participants’ concerns were
less pronounced when an actual benet of sharing the data was
clearly visible. However, the users’ concerns on how the system
inferred the user model (concretely users’ personality) remained
strongly highlighted in semi-structured interviews. On a related
topic, a recent study [
] explored how users perceived automatic
personality detection using a mixed-methods approach. They con-
ducted a survey (with 89 participants) to understand which data
streams users were willing to share, and afterwards developed a
machine learning model (with the preferred data from 32 partici-
pants) to predict personality traits. Subsequently, they interviewed
9 participants to understand how users perceived the personality
prediction model after seeing the prediction results. They observed
that participants’ opinions on data sharing were mixed and sug-
gested that transparency can help in addressing users’ concerns
such as trust and privacy.
In our randomised placebo study, we primed participants that
the selection of recommended activities in a mental health app
was personalised to their personal data. The goal was to explore
if the benets of having a personalised experience will outpower
their concerns about sharing the data. The success of placebo eect
was evaluated and conrmed by including a control group in the
experiment. We additionally contributed to the existing literature
by comparing the actual app engagement and the user’s preference
towards data sharing.
Data privacy and autonomy were emphasised as key topics in the
ethics of digital well-being [18]. To the best of our knowledge our
work is the rst that thoroughly explores how these two elements
impact users’ actual app usage and self-declared preferences in a
digital mental health app.
To understand users’ preferences and the usage of a mobile mental
health app in the context of delivering recommendations, we used
.Foundations was a suitable platform for our study as
(a) (b)
Figure 1: (a) Open library with all activities. Some activities
are locked and dependent on the completion of others. (b)
Recommended activities at the bottom of the home screen.
All activities shown to users are randomly generated.
it contains a large library with numerous intervention activities. In
this section, we detail the methodology applied in this experiment.
3.1 Mental Health App
Foundations is an evidence-based digital mental health platform
designed to improve users’ resilience and decrease their stress lev-
els. At the time of this study, the version of the app incorporated
10 modules with 102 intervention activities in total. Each activity
has a specic format—such as simple blog posts, relaxation audios,
interactive journaling and games—to help users relax, sleep bet-
ter, boost their self condence, think positively, and similar. The
app provides an open library with some activities locked in the
beginning (Figure 1 (a)). Upon completion of each activity, users are
asked to rate their experience using a thumbs up or thumbs down
icon. The home screen contains a section called "Other activities
for you" that shows a recommendation of two activities at a time
(Figure 1 (b)). In our study, these recommendations were random
i.e. not personalised (although presented so), which guaranteed
that all the users have received the same experience. Automatic
recommendations may work better for specic groups of users
which would have biased the results of our study.
3.2 Study Design
To determine how the way of data sharing and the autonomy of
the user experience impact both users’ preferences and the actual
usage of a mental well-being app, we designed a study consisting
of three parts: (1) Onboarding questionnaire, followed by (2) the
app usage for seven days with daily reminders, and nally (3) an
exit questionnaire (Figure 2). As the goal was to investigate the
eect of the two variables, "autonomy" and "preferred way of data
CHI ’21, May 8–13, 2021, Yokohama, Japan Pieritz and Khwaja et al
sharing", we designed a two-factor factorial experiment. A two-
factor factorial design is an experimental design in which data
is collected for all possible combinations of the levels of the two
factors of interest [
]. In our case, each factor has two levels. For
the preferred way of data sharing, the two levels are (1) selecting
mobile sensing data and (2) completing a questionnaire, for building
a personalisation model. For the former level, half of the users
(randomly selected) were asked to select smartphone data streams
that can be used to automatically infer their personality. The other
half of the users received the 20-item personality questionnaire [
to complete. We dened two dierent user experiences that we refer
to as "the degree of autonomy", namely (1) receiving a primarily
guided user experience with the option to choose other activities
out of an open library, and (2) receiving a primarily autonomous
user experience with the option to use recommended activities on
the home screen.
Overall, this led to 4 experimental groups that can be combined
according to the variables they have in common. The combination
of two groups along one identical variable is referred to as a cluster.
For example, the two groups that receive an autonomous user
experience—but dier in the way of data sharing—are combined
and referred to as the autonomous cluster. This design allows for
one group per each permutation of the two variables, which enables
an analysis of all conditions separately, as well as combined. For an
eect size (Cohen’s d) of 1, statistical power of 95% and signicance
level of 0.05, the estimated sample size to produce a meaningful
statistical signicance with the Mann-Whitney test is 30. Thus, we
set the criteria to have at least 30 samples in each group.
The groups diered in the onboarding questionnaire and in daily
reminders during the app usage. The primary purpose of the on-
boarding questionnaire was to give the user the impression that
the collected data will be the base for receiving personalised recom-
mended activities in the app. However, this questionnaire was solely
used for priming, and no actual personalisation was occurring in
the app. All the activities participants found in the recommendation
section of the app were randomly selected, as described in section
3.1. The onboarding questionnaire consisted of the data sharing
(smartphone modalities or questionnaire) and directions on app
usage (autonomous or guided).
Upon completion of the questionnaire, all participants received
instructions on how to install Foundations and were asked to com-
plete at least one activity a day for one week. Daily reminders were
sent according to the degree of autonomy. These reminders con-
sisted of either a daily recommended activity for participants in
the guided cluster, or a general reminder to use the app for those
in the autonomous cluster. The daily recommended activities were
selected from the most popular activities in the app’s library. After
seven days, all users completed the exit questionnaire—which was
identical for all groups.
Since we did not use the users’ data to personalise recommenda-
tions in the app, we included an additional control group to verify
whether or not the priming was successful. The control group lled
out a control questionnaire to match the workload to the other
groups but this group did not receive any priming on personalisa-
In summary, this design resulted in having ve groups:
: Personality questionnaire + daily
email with activity recommendation + priming that the email
recommendations are based on the reported personality
: Data modality selection + daily email
with activity recommendation + priming that the email rec-
ommendations are based on the automatically inferred per-
: Personality questionnaire
+ daily email as a general reminder to complete one activity
+ priming that recommendations on the home screen are
based on the reported personality
: Data modality selection + daily
email as a general reminder to complete one activity + prim-
ing that recommendations on the home screen are based on
the automatically inferred personality
: Control questionnaire + daily email as a general
reminder to complete one activity
Our study was approved by the internal ethics board. As the
whole set of intervention activities in Foundations has been recently
evaluated in a Randomised Control trial [
] and demonstrated an
overall improvement in users’ overall well-being, no harm was
expected to be introduced by a deception study that recommends
users with the most popular activities.
3.3 Data Collection
The onboarding and exit questionnaires were created using the
survey collection tool. We designed ve variations of
the onboarding questionnaire for each of the ve groups dened
in Section 3.2. In each of these questionnaires, participants were
presented with a consent form explaining details on the data collec-
tion and purpose of the study—in compliance with the EU General
Data Protection Regulation (GDPR). For users in questionnaire
cluster, a 7-point Likert scale (1 strongly disagree to 7 strongly
agree) was used for the personality questionnaire. Users in the data
cluster were provided with 10 dierent smartphone sensing data
categories and asked to select at least 4 that could be sampled from
their smartphones. The rationale for introducing the data choice
was to resemble the choice that users have in real-world applica-
tions. Android and iOS give users the possibility to opt-out from
specic data streams. Moreover, in Europe—where we conducted
the experiments—this is a strict regulatory requirement as per the
GDPR. We selected the 10 most common sensing modalities that
have been used in the previous literature to predict personality
traits [7–9, 30, 40, 63]. The 10 options included:
Time spent with dierent applications (App time)
Geographical location (Location)
Number of steps walked (Steps)
Noise in the environment (Noise)
Bluetooth and WiFi data (Bluetooth/Wi)
Battery level (Battery)
Ambient Light in the environment (Light)
Call history (without phone numbers) (Calls)
Frequency of social network usage (Social network)
Phone lock/unlock data and screen usage (Un(Lock))
Personalised Recommendations in Mental Health Apps CHI ’21, May 8–13, 2021, Yokohama, Japan
Figure 2: Experimental Design
We asked users to select at least 4 options out of 10 and explained
that selecting more options leads to a higher accuracy in inferring
personality. After the onboarding, users were asked to use Founda-
tions for a week. App usage logs consisting of activities completed,
time taken per activity etc. were recorded for each user during the
Upon using the app for an entire week, the users were presented
with an exit questionnaire. This questionnaire had four sections ask-
ing users
about their overall experience of the mental health
app and their perspectives on personalisation of the app
they prefer to have autonomy in selecting activities or have the app
select the right activity for them,
if they prefer to complete
a personality questionnaire or provide smartphone sensing data
and their privacy preferences regarding the same. Based on the
Technology Acceptance Model [
], the rst set of questions
was dened to understand how users perceived the app in general.
The second
and third
set of questions were related to
the users’ preference to be guided vs to have autonomy, as well
as sharing the data through a questionnaire or by providing their
mobile sensing data.
also included questions related to privacy
concerns (a recent study that explored personality proling by a
chatbot indicated that participants generally regarded personality
as sensitive data that they would be reluctant to share [61]).
were delivered as a 7-point Likert scale or a multiple
choice (select ’X’ or ’Y’). Additionally, we had two free text ques-
tions where the users could give suggestions on the how the app
could be improved and more personalised to them (
). Subse-
quently, we presented the participants with demographic questions
- gender, age
, education level and the continent of residence. The
exit questionnaire concluded with a text block that debriefed the
3We asked range of age rather than exact number
3.4 Participants and Inclusion Criteria
The participants in our study were recruited through an external
agency that operates in Europe. The inclusion criteria included a
high prociency in English and the minimal age of 18. We also
required a minimum of 30 participants in each group and gender
balance. In early July 2020, the recruitment agency sent an invite for
the study through their internal mailing list and all the participants
completed the study by the end of July 2020. All participants were
recruited from Europe. Through the recruitment agency, we pro-
vided a monetary incentive to all participants who completed the
study. Users were instructed that successful completion and receiv-
ing the incentive requires completing the onboarding questionnaire,
installation and use of the mental health app for 1 week, and com-
pleting the exit questionnaire. Users were reminded each day that
skipping any of the steps would result in their disqualication from
the study.
700 participants were registered for the study and were randomly
assigned to one of the ve groups. Based on the group allocation,
they were asked to complete the corresponding onboarding ques-
tionnaire. All 700 users completed the onboarding questionnaire
and were then instructed to install the app on their smartphones.
Out of the 700 users, 353 participants installed the mental health
app. For one week after installing the app, users received daily
reminders to use the app and to engage with at least one activity
per day. Using the app for 4 or more days qualied the users for
the last stage of the study. We chose a threshold of 4 days, as any-
time less than this would be insucient to explore the app well.
241 participants fullled this criteria and were directed to the exit
questionnaire. Finally, 218 users completed the exit questionnaire
and this population was used for our analyses. The demograph-
ics of the participants are provided in Table 1. Having more than
40 participants in each group exceeded the minimum number of
completes required in each group. The demographic distribution
indicates that the sample involved a diverse population.
CHI ’21, May 8–13, 2021, Yokohama, Japan Pieritz and Khwaja et al
Table 1: Demographics of participants
Demographic Particular Complete QG DG QA DA C
Size of Population 218 45 52 40 40 41
Gender Female 113 24 25 21 17 26
Male 105 21 27 19 23 15
Age 15-19#6 - 1 1 2 2
20-24 26 3 3 8 5 7
25-29 23 6 3 4 7 3
30-34 26 3 9 4 6 4
35-39 25 7 7 3 4 4
40-44 31 6 9 5 3 8
45-49 17 4 6 3 2 2
50-54 29 7 9 5 4 4
55-59 11 4 2 - 3 2
60+ 24 5 3 7 4 5
Education Secondary School 69 14 19 13 14 9
Bachelor’s Degree 92 17 22 17 16 20
Master’s Degree 30 9 5 4 6 6
Ph.D. or higher 8 1 2 2 - 3
Trade School 15 1 4 4 4 2
Prefer not to say 4 3 - - - 1
*QG = Questionnaire-Guided, DG = Data-Guided, G3 = Questionnaire-Autonomous, G4 = Data-Autonomous, C = Control
#The minimum age of participants is 18. We provided this age option to maintain uniformity with the other age ranges
3.5 Statistical Methods
To report statistics, we use the guidelines laid out in [21]. For nor-
mally distributed data, we report mean (
) and standard deviation
) and for data that deviated from the normal distribution, we
report the median value (
) and interquartile range (
). In-
terquartile range is dened as dierence between the upper quar-
tile (75 percentile) and lower quartlie (25 percentile). In order to
compare the dierences in two distributions, we use the Mann-
Whitney U test (also known as the Wilcoxon rank sum test) [
The Mann-Whitney U test is non-parametrised and works well for
comparing distributions that are non-normal, as opposed to the
parametric Student’s t-test. Additionally, when comparing three
or more distributions, we use the Kruskal–Wallis test (the non-
parametric equivalent of the one-way ANOVA) [
]. Although the
experimental design would have allowed us to conduct ANOVAs
(or Kruskall-Wallis tests) to look at dierences between all 5 con-
ditions, we decided not to use this statistical method because our
research questions focused on degree of autonomy and data sharing
separately rather than combined. The literature provided no base
to hypothesise that any of those combinations could lead to signi-
cantly dierent preferences or behaviours and we did not want to
make many pairwise comparisons only for the sake of obtaining
more comparisons.
Data processing was performed with the Python programming
language. All statistical tests (except the power analysis) were con-
ducted using the SciPy library [
] while data visualisation plots
were generated using the Matplotlib library [
]. The power anal-
ysis was conducted in Microsoft Excel, using the Mann-Whitney
power function
from the Real Statistics library [
4.1 Experimental validity
We rst tested whether the inclusion criteria and randomisation
were executed according to our design. Major demographic charac-
teristics as well as the total number of participants, were correctly
balanced across the groups (Table 1). To probe the additional mo-
tivation to use the app beyond the monetary incentive, we asked
participates to rate the extent to which they wanted to reduce the
amount of stress levels on a Likert scale 1 to 7. The median score of
6 (IQR = 2) suggested a generally high interest in reducing stress lev-
els. A Kruskal-Wallis test showed no signicant dierence among
the ve groups (H(4) = 2.34, p> .05), which indicates that the ran-
domisation across the groups was correctly applied and that the
stress level was not expected to act as a confounding factor when
comparing results across the groups.
Participants were informed that they were going to receive
recommeneded activities personalised for them. However, in real-
ity, the recommended selection of activities (both those sent daily
and those included within the app) were random. Therefore, the
success of our priming strategy was a prerequisite for exploring the
perception and eects of personalised recommendations. Unlike
other domains—such as shopping items, music, movies, etc.—where
people are typically well aware of what constitutes a personalised
recommendation, there is a low level of understanding of meaning-
ful symptoms and personal characteristics when it comes to the
personalisation of interventions. To this end, we compared the re-
sponse to the statement “I believe that activities were personalised
for me” (provided at the end of the study in the Exit questionnaire)
which was rated on a scale 1-7. We compared the ratings between
the personalisation cluster (QG, DG, QA and DA) and the control
Personalised Recommendations in Mental Health Apps CHI ’21, May 8–13, 2021, Yokohama, Japan
Table 2: Summary of Results
Autonomous vs. Guided Questionnaire vs. Data
In-App Behaviours Completed Activities
Signicantly more completed activ-
ities in the autonomy cluster
No signicant dierence in number
of completed activities
Ratio of Recommended
versus Chosen Activities
Autonomous cluster: 25%
Guided cluster: 60%
Session Duration
No signicant dierence in session
No signicant dierence in session
Activity Ratings
Signicantly higher ratings of activ-
ities in the autonomy cluster
No signicant dierence in activity
Declarative Data Preference
All users preferred to have a more
guided user experience
All users preferred to complete a
personality questionnaire.
Privacy Preference -
All users agreed that providing mo-
bile data had more privacy risks
Onboarding Behaviours Completion time -
No signicant dierence in comple-
tion time
group; and the former rated the perceived personalisation signi-
cantly higher (U= 2725.5, p< .05). Despite the fact that the activity
recommendations were not based on the Big Five personality traits,
the participants believed so–indicating that the priming was suc-
The results from our experiment are summarised in Table 2 and
explained in detail in the following sections.
4.2 Guided vs autonomous user experience
We compare the app usage behaviours and self-reported prefer-
ences between the guided (QG+DG) and the autonomous clusters
4.2.1 App usage behaviours. The number of completed activities
considers only those activities that the user both started and n-
ished. Figure 3 (a) shows that the number of activities completed by
users in the autonomous cluster (Mdn = 19, IQR = 22.5) was signi-
cantly higher than those in the guided cluster (Mdn = 7, IQR = 3), U
= 1427, p< .001. We also observed that the ratio of recommended
activities from the home screen vs. voluntary chosen activities
from the library amounts to 25% for the autonomous cluster. While
the ratio of recommended activities from the email reminders vs.
activities from the library made up 60% in the guided cluster.
Subsequently, we investigated how the degree of autonomy im-
pacted the session duration–dened as the median number of sec-
onds for which a user was actively using the app before closing
it. We observed that there was no statistical dierence between
autonomous (Mdn = 184 seconds, IQR = 363.2 seconds) and guided
(Mdn = 158 seconds, IQR = 280.4 seconds) clusters, U= 3346, p>
.05 (Figure 3 (b))
The design of the Foundations provides a simple format for rat-
ing each activity, namely the users are asked to rate each activity
upon its completion with either a thumbs up or thumbs down.
We binary coded these ratings as 1 and 0 respectively and calcu-
lated the proportion of good (1) ratings per user–number of good
ratings/(number of good+bad ratings)–which resulted in a value
between 0 and 1. Figure 3 (c) shows that the proportion of good
ratings of users in the autonomous cluster (Mdn = 1, IQR = 0.1) was
signicantly higher than in the guided cluster (Mdn = 0.85, IQR =
0.2), U= 3047, p< .01.
4.2.2 Self-reported preference on autonomy. After using the app for
a week, we asked users to rate if:
. They would like the mental
health app to choose an activity/intervention for them (guided)
. They would like to choose an activity/intervention for
themselves (autonomous). In general, users agreed more strongly
that the app should provide an activity to them (Mdn = 5, IQR = 2),
as opposed to them having autonomy to select their own activities
(Mdn = 4, IQR = 2). The Mann-Whitney U test conrms that there
is a statistical signicance in their preference between the two
(𝑈=17051.0,𝑝 <.001). When asked to directly compare the two
options, 77.9% of the users preferred to have an activity provided
to them by the mental health app.
Subsequently, we compared the preference for the guided and
autonomous clusters separately. The percentage of users that pre-
ferred to have an activity suggested directly by the app was similar
across the guided (78.4%) and autonomous clusters (77.8%). Next,
we assessed the dierence in average ratings between
within each cluster. For both the guided and autonomous clus-
ters, users rated
higher than
with statistical signicance
𝑈=2931.5, 𝑝 <.001
𝑈=2623.5, 𝑝 <.01
respectively). This
shows that, irrespective of receiving a guided or autonomous expe-
rience, all users preferred to have an app that suggests interventions
for them instead of selecting activities solely on their own.
4.3 Questionnaire vs data selection
We compare the app usage behaviours and self-reported preferences
between the questionnaire (QG+QA) and data selection clusters
4.3.1 App usage behaviours. Similar to the comparison described
in Section 4.2.1, we compared the number of completed activities,
median session duration per user and proportion of good ratings
between the questionnaire and data selection clusters. Using Mann
Whitney U tests, we found no signicant dierence for any of these
metrics (Supplementary Figure 1).
CHI ’21, May 8–13, 2021, Yokohama, Japan Pieritz and Khwaja et al
p < 0.001
(a) (c)
N.S. p < 0.001
Figure 3: Dierences between the guided and autonomous clusters for (a) average number of completed activities, (b) median
session duration per user and (c) proportion of good (1) ratings
4.3.2 Onboarding behaviours. We aimed to explore whether the
way of data sharing (completing the personality questionnaire vs se-
lecting the data modalities) is related to the time taken to complete
the onboarding questionnaire. To do this, we compared the com-
pletion time for the questionnaire cluster against the data selection
cluster. While the median time taken to complete the onboarding
questionnaire was greater for the questionnaire cluster (Mdn = 142
seconds, IQR = 82 seconds) than the data selection cluster (Mdn =
102 seconds, IQR = 68 seconds), the Mann Whitney U test indicated
that there was no signicant dierence between the two distribu-
tions (
𝑈=1799.5, 𝑝 >.05
). The number of screens and the priming
text in the onboarding questionnaires were comparable for the two
clusters. The major dierence in the two was the personality ques-
tionnaire versus the smartphone sensing data selections. Hence, it
can be concluded that there is no signicant dierence between the
time taken to complete the 20-item personality questionnaire and
the time needed to select a subset of a list of smartphone sensing
data modalities, in an onboarding process.
In addition, we also explored the data categories that the users
in the data selection cluster were most willing to provide. Figure 4
shows the proportion of users that provided a particular data modal-
ity. The error bars in the gure represent the standard deviation of
the proportions obtained individually from DG and DA. The users
were least willing to provide 1. call history (25.0%), 2. bluetooth
and wi data (26.1%) and 3. noise in the environment sampled from
the microphone (34.0%). As expected, these are data modalities that
have the largest privacy and security concerns across both users
and technologists [
]. Additionally, the data modalities
that users are most willing to provide are 1. battery level (72.8%),
2. number of steps walked (71.7%) and 3. time spent on dierent
applications (68.5%).
4.3.3 Self-reported preference on data sharing. Users were asked
to rate from 1 to 7:
. If they were willing to complete a 5-10
Figure 4: Proportions of users from the data sharing cluster
that preferred to provide dierent data modalities. Column
names correspond to the data modalities described in Sec-
tion 3.3
min personality questionnaire (with up to 50 questions) to receive
personalised recommendations for activities and
. If they were
willing to provide personal sensing data (e.g., GPS location) from
their smartphone to receive personalised recommendations for
activities. A Mann-Whitney U test conrmed with statistical sig-
nicance (
𝑈=11568, 𝑝 <.001
) that users were more willing to
complete a personality questionnaire (Mdn = 6, IQR = 2), than pro-
vide their smartphone sensing data for personalisation (Mdn = 4,
IQR = 3). The users were also asked to select if
They would
rather prefer to complete a personality questionnaire or provide
their smartphone data. 90.4% of the 218 users said they would prefer
to complete a personality questionnaire to have a personalised app
Personalised Recommendations in Mental Health Apps CHI ’21, May 8–13, 2021, Yokohama, Japan
Next, we compared the preferences for the questionnaire and
data selection clusters. For
, The percentage of users that pre-
ferred to complete the personality questionnaire instead of pro-
viding data is notably high across both the clusters (questionnaire:
92.9% and data selection: 85.9%). We also assessed the dierence
in ratings between
within each cluster. Using Mann-
Whitney U tests, we observed that users in both clusters rated
higher than
, with statistical signicance (
𝑈=1915, 𝑝 <.001
for the questionnaire cluster and
𝑈=2112.5, 𝑝 <.001
for the
data selection cluster). This indicates that all users—irrespective
of the way of data sharing—preferred to complete the personality
questionnaire over providing their smartphone data.
4.3.4 Self-reported preference on privacy risks. An additional objec-
tive was to investigate if there was a dierence in how users viewed
privacy risks between completing a personality questionnaire and
providing their smartphone data. We asked users to rate:
. If
they believed that lling out personality questionnaires for person-
alisation has potential privacy and data protection risks and
. If
they believed that providing a mental health app with their smart-
phone’s sensing data for personalisation has potential privacy and
data protection risks. All users believed that completing a personal-
ity questionnaire had less privacy risks (Mdn = 4, SD = 2) compared
to providing sensing data from their smartphones (Mdn = 5, SD
= 3). The dierence between the two questions was statistically
𝑈=21118.5, 𝑝 <.05
. Within the two clusters, we also
found a similar trend. Both clusters rated
higher than
statistical signicance (
𝑈=1206.5, 𝑝 <.01
for the questionnaire
cluster and 𝑈=2106.5, 𝑝 <.05 for the data selection cluster).
In this study, we explored how (1) the degree of autonomy in the
user experience, and (2) the data to be shared impact users’ prefer-
ences and app behaviours in a mental health app. In the following,
we discuss the results and highlight the main takeaways.
5.1 Asymmetry between in-app behaviours
and preference for the degree of autonomy
The balance between autonomy and guidance is a critical topic in
personalised recommender systems, and when it comes to the area
of digital mental health it has a peculiar importance. In a traditional
setting, for the selection of the right intervention, autonomy is
secondary to the expertise of the medical professional. However,
in digital experiences, autonomy was shown to be an essential
design criterion to create engagement [
]. Our results highlight
the challenge of nding the right balance between the two and shed
light on the contrast between users’ preferences and their actual
behaviour in the app. This together provides a set of practical
takeaways for user experience designers that we discuss in the
Our ndings demonstrated that the dierence in the degree of au-
tonomy could inuence subsequent behaviours in a mental health
mobile application. We showed signicant between-group dier-
ences in user behaviours, although all participants used the same
application. Since there was no actual personalisation in the app,
our results are independent of the accuracy of a recommendation
system and solely ascribed to the perceived degree of autonomy in
the user experience.
Our results challenge the popular notion that the more person-
alised or guided, the better an app is perceived by users. We wit-
nessed that a primarily autonomous experience led to the greatest
engagement i.e. the highest number of completed activities and best
ratings. Contrary to expectations, the most guided and tailored expe-
rience appeared to discourage users’ exploration and spontaneous
app use. However, when asked about the subjective preference af-
ter the study had been completed, a signicantly higher number
of users expressed their preference for more guidance instead of
autonomy. This nding shows a discrepancy between behavioural
and declarative data. Our results conrm that the preferences com-
municated by the user do not necessarily result in quantitatively
improved engagement metrics. This emphasises the importance of
cautiously interpreting user research results and combining them
with quantitative data, when possible, throughout the process of
designing personalised user experiences.
Interestingly, several answers to the free-text question Do you
have any suggestions on how Foundations could be more personalised
for you? referred to reminders, for instance: “Have daily reminders
to help with routine”,“Maybe a reminder to be set daily” and “I like
receiving the daily reminders. I have an 18 month old, so maybe you
could set the reminder to come back on later, like a snooze button?“.
This may inspire a potential solution for an experience design that
is in-between autonomy and guidance e.g. a combination of an
autonomous navigation and more frequent notications suggesting
personalised content. This can result in providing more guidance
without negatively impacting the users’ perceived or actual agency.
In reality, none of the two clusters of users were exposed to
an extreme choice between autonomy or guidance. The imposed
content consumption, primarily in an autonomous versus primarily
in a guided way, was clearly reected in the actual app use—the
guided cluster completed a signicantly higher number of recom-
mended activities than the autonomous cluster. However, the total
number of completed activities was three times higher in the au-
tonomous cluster. As ecacy and engagement are key pillars of
digital intervention design [
], our results can be utilised by de-
signers to optimise for these metrics. In line with our ndings,
the interaction in mental health apps could be designed in a simi-
lar way to popular entertainment applications such as Spotify or
Netix. Specically, the interaction design may directly encourage
autonomous navigation while providing an easy access to recom-
mended and personalised content, thus mitigating choice overload.
Moreover, dierent trade-os can be made between engagement
and ecacy. If the success of a specic digital therapy does not
critically depend on a volume of the app use but on a targeted
engagement with certain interventions, the user experience can be
more guided. On the other hand, autonomous interaction designs
would be more suitable to encourage a higher frequency of the app
use when critical for the therapy success (e.g. meditation techniques
are supposed to be practiced more regularly for optimal results).
Our results are aligned with the autonomy advocates (Ryan &
Deci [
], Peters [
]), however our ndings additionally underline
an important space for utilising the advantages of increasingly
sophisticated recommender systems that ultimately can optimise
for both ecacy and engagement.
CHI ’21, May 8–13, 2021, Yokohama, Japan Pieritz and Khwaja et al
5.2 Users prefer questionnaires but app
engagement is unaected
Personality traits have been used as a foundation for personalising
digital health applications [
] and for providing personalised ac-
tivity recommendations that can improve mental well-being [
Personality traits can be obtained using questionnaires [
or inferred using machine learning models. The latter has given
rise to the eld of automatic personality detection. Studies in this
eld have shown that personality can be detected from Facebook,
Twitter or Instagram usage [
], gaming behaviour [
music preferences [
] and smartphone sensing data [
All of these studies are based on the premise that digital behaviour
data—captured passively—can be used to infer a user’s personal-
ity traits automatically with machine learning, without requiring
them to answer long questionnaires. However, none of these stud-
ies explored users’ preferences in obtaining such data to infer a
user’s personality passively, especially to personalise features in a
real-world application. Our work set out to answer this important
question, in the context of obtaining smartphone sensing data to
personalise user experience in a mental health app.
Our results indicate that an overwhelming majority of the users
prefer to complete a personality questionnaire over providing their
mobile sensing data, irrespective of whether they completed the
personality questionnaire before using the app or were asked to
provide their smartphone data. These results are consistent with re-
lated studies showing users’ improved comprehension of algorithms
by using "white-box" explanations [
]. Users have predominantly
perceived that their smartphone sensing data entails more privacy
risks than completing a personality questionnaire. This can be at-
tributed to trust and privacy concerns with the collection of any
kind of digital data [12, 19].
Despite the fact that smartphone sensing was perceived as obtru-
sive, there was no dierence in app behaviour between users who
completed a personality questionnaire and those who opted to pro-
vide mobile sensing data. Additionally, results from the onboarding
process indicate that there is no signicant dierence between the
time taken to complete the data consent process and the time taken
to complete the 20 item personality questionnaire [
]. Expectedly,
users were less willing to provide more invasive data such as call
history, Bluetooth data and noise from the microphone. This can
have a signicant impact on the accuracy of personality prediction
models. Recent studies have indicated that call history data [
Bluetooth data [
] and noise data from microphone [
] are
strong predictors of personality traits.
Should collecting mobile sensing data not be leveraged to provide
other benets to users than personality modeling for personalising
the user experience, the app designers may consider avoiding the
collection of smartphone data altogether. Users appear to have a
strong preference towards completing a questionnaire instead and
although automatic personality modelling is supposed to reduce
the end user eort, it does not bring an added value in this context.
This was further echoed by the users’ answers to the free-text
question Do you have any suggestions on how Foundations could be
more personalised for you? including “An in depth questionnaire“,
“Maybe a regular opt-in questionnaire so you let the app know whether
your conditions or state of mind is changing“ and “I think it could
be more personalised by asking more about the persons life, work,
family and friends.. This suggests that users may be willing to
provide even more personal information than personality as long
as they consciously and directly provide it and the app becomes
more tailored to their needs as a result. As additionally suggested
by the users, momentary information represents an opportunity
for personalising the experience even further. In this regard, the
Ecological Momentary Assessment (EMA) [
] has been a widely
used method that prompts users (via smartphone notications) at
dierent times during the day to report how they feel, what they
are doing, where they are, and similar. Recent studies have shown
that behaviour and mood data collected via mobile EMAs is related
to mental health and health outcomes such as sleep [
]. Thus, data
gathered from EMA surveys can point out the opportune moments
to provide personalised interventions. Ultimately, the decision on
gathering user models through passive sources or questionnaires
requires practitioners to make a trade-o between the required
amount of information, model accuracy, users’ privacy concerns
and a potential survey fatigue [49].
5.3 Limitations
Our study required us to make several trade-os in the experimental
design, which we discuss in the following.
Firstly, having identical app versions for all groups was an asset
for our experimental design, although it also represented a limita-
tion at the same time. On the one hand, it enabled us to control the
perceptional aspect. On the other, having more advanced versions
would have allowed us to explore the interaction between perceived
accuracy and perception of personalisation, which could make the
results more generalisable.
Secondly, we did not personalise the app according to each user’s
actual personality which may prompt a question whether the de-
ception of personalisation will impact the users’ trust in the app
and result in a lower app usage. However, an alternative solution of
providing actual personalisation would have entailed a new set of
challenges. In particular, the quality of recommendations is rarely
uniform and frequently biased towards specic user proles. This
issue would have been dicult or even impossible to control for.
Instead, by providing random recommendations based on the most
popular activities, we reduced the impact of this issue. We recognise
that there is no ideal experimental design in this regard and that it
entails trade-os. However, 25% of the completed activities in the
autonomous group were recommended, which indicates that the
choice of the most popular activities was appropriate. Furthermore,
the recommendations were perceived as personalised, as tested
between the personalisation cluster with the control group (Section
Thirdly, we did not collect smartphone data from participants
in the data group. As detailed in Section 3.3, we asked users to
provide us access to their preferred data streams as a base for
personalisation. However, in order not to increase the complexity
of the study, we opted to use such data consent forms only as
priming. Collecting smartphone sensing data would have given
us an opportunity to do a more detailed behavioural analysis and
further our ndings.
Personalised Recommendations in Mental Health Apps CHI ’21, May 8–13, 2021, Yokohama, Japan
Lastly, all of our participants were recruited in Europe, which
may have introduced a cultural bias and reduced the generalisability
of our ndings.
In this study, we investigated how the degree of autonomy in the
user experience and dierent ways of data sharing aect both users’
preference and the actual usage of a mental well-being app. We
conducted a randomised placebo study with a two-factor factorial
design consisting of an onboarding questionnaire, app usage over
seven days, and an exit questionnaire.
Our results revealed an asymmetry between what users declared
as their preference for autonomy (versus guidance) and how they
used the app in reality. The analysis of in-app behaviours showed
that a primarily autonomous design with the option to access con-
tent recommendations kept users more engaged with the app than
a primarily guided experience design. However, when asked in the
form of questionnaires, the majority of participants declared their
preference for a more guided experience. The analysis of qualitative
data suggested a potential compromise between dierent experi-
ence designs to satisfy both engagement metrics and subjective
user preferences.
Personalising the user experience typically requires personal
data to be shared, which may impact the manner in which the
app will be used. However, when analysing the actual app use, we
found no impact of the data source on how users interacted with
the app. Interestingly, the time taken for completing a personal-
ity questionnaire was comparable to the duration of completing
a form to obtain consent for the usage of smartphone data. Yet,
users indicated a strong preference for completing a personality
questionnaire over providing their mobile sensing data (to infer
As mental health applications are becoming increasingly impor-
tant and rich in content, our study provides key design takeaways
on delivering personalised recommendations, to ultimately improve
both engagement and ecacy of interventions.
We would like to thank Emily Stott and Jordan Drewitt for their
feedback and support. This work has been supported from fund-
ing awarded by the European Union’s Horizon 2020 research and
innovation programme, under the Marie Sklodowska-Curie grant
agreement no. 722561.
Jan Blom. 2000. Personalization: a taxonomy. In CHI’00 extended abstracts on
Human factors in computing systems. 313–314.
Christopher Burr, Mariarosaria Taddeo, and Luciano Floridi. 2020. The ethics
of digital well-being: A thematic review. Science and engineering ethics (2020),
Silvina Catuara-Solarz, Bartlomiej Skorulski, Inaki Estella, Claudia Avella-Garcia,
Sarah Shepherd, Emily Stott, and Sophie Dix. 2021. Ecacy of Foundations, a
Digital Mental Health App to Improve Mental Well-Being, during COVID-19: A
Randomised Controlled Trial. Manuscript submitted for publication (2021).
Nitesh V Chawla and Darcy A Davis. 2013. Bringing big data to personalized
healthcare: a patient-centered framework. Journal of general internal medicine
28, 3 (2013), 660–665.
Hao-Fei Cheng, Ruotong Wang, Zheng Zhang, Fiona O’Connell, Terrance Gray,
F Maxwell Harper, and Haiyi Zhu. 2019. Explaining decision-making algorithms
through UI: Strategies to help non-expert stakeholders. In Proceedings of the 2019
chi conference on human factors in computing systems. 1–12.
Prerna Chikersal, Danielle Belgrave, Gavin Doherty, Angel Enrique, Jorge E
Palacios, Derek Richards, and Anja Thieme. 2020. Understanding client support
strategies to improve clinical outcomes in an online mental health intervention.
In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems.
Gokul Chittaranjan, Jan Blom, and Daniel Gatica-Perez. 2011. Who’s who with big-
ve: Analyzing and classifying personality traits with smartphones. In Wearable
Computers (ISWC), 2011 15th Annual International Symposium on. IEEE, 29–36.
Gokul Chittaranjan, Jan Blom, and Daniel Gatica-Perez. 2013. Mining large-scale
smartphone data for personality studies. Personal and Ubiquitous Computing 17,
3 (2013), 433–450.
Yves-Alexandre de Montjoye, Jordi Quoidbach, Florent Robic, and Alex Sandy
Pentland. 2013. Predicting personality using novel mobile phone-based metrics.
In International conference on social computing, behavioral-cultural modeling, and
prediction. Springer, 48–55.
[10] Edward L Deci and Richard M Ryan. 2012. Self-determination theory. (2012).
M Brent Donnellan, Frederick L Oswald, Brendan M Baird, and Richard E Lucas.
2006. The mini-IPIP scales: tiny-yet-eective measures of the Big Five factors of
personality. Psychological assessment 18, 2 (2006), 192.
Catherine Dwyer, Starr Hiltz, and Katia Passerini. 2007. Trust and privacy concern
within social networking sites: A comparison of Facebook and MySpace. AMCIS
2007 proceedings (2007), 339.
Mahmoud Elkhodr, Seyed Shahrestani, and Hon Cheung. 2012. A review of
mobile location privacy in the internet of things. In 2012 Tenth International
Conference on ICT and Knowledge Engineering. IEEE, 266–272.
Gunther Eysenbach. 2005. The law of attrition. Journal of medical Internet
research 7, 1 (2005), e11.
Bruce Ferwerda and Marko Tkalcic. 2018. Predicting Users’ Personality from
Instagram Pictures: Using Visual and/or Content Features?. In The 26th Conference
on User Modeling, Adaptation and Personalization, Singapore (2018).
Bruce Ferwerda and Marko Tkalcic. 2018. You Are What You Post: What the Con-
tent of Instagram Pictures Tells About Users’ Personality. In The 23rd International
on Intelligent User Interfaces.
Luciano Floridi. 2016. Tolerant paternalism: Pro-ethical design as a resolution of
the dilemma of toleration. Science and engineering ethics 22, 6 (2016), 1669–1688.
Luciano Floridi, Josh Cowls, Monica Beltrametti, Raja Chatila, Patrice Chazerand,
Virginia Dignum, Christoph Luetge, Robert Madelin, Ugo Pagallo, Francesca Rossi,
et al
2018. AI4People—an ethical framework for a good AI society: opportunities,
risks, principles, and recommendations. Minds and Machines 28, 4 (2018), 689–
Peter Gilbert, Landon P Cox, Jaeyeon Jung, and David Wetherall. 2010. Toward
trustworthy mobile sensing. In Proceedings of the Eleventh Workshop on Mobile
Computing Systems & Applications. 31–36.
Lewis R Goldberg, John A Johnson, Herbert W Eber, Robert Hogan, Michael C
Ashton, C Robert Cloninger, and Harrison G Gough. 2006. The international
personality item pool and the future of public-domain personality measures.
Journal of Research in personality 40, 1 (2006), 84–96.
Farrokh Habibzadeh. 2017. Statistical data editing in scientic articles. Journal
of Korean medical science 32, 7 (2017), 1072–1076.
Sajanee Halko and Julie A Kientz. 2010. Personality and persuasive technology:
an exploratory study on health-promoting mobile applications. In International
conference on persuasive technology. Springer, 150–161.
Margeret Hall and Simon Caton. 2017. Am I who I say I am? Unobtrusive self-
representation and personality recognition on Facebook. PloS one 12, 9 (2017),
John D Hunter. 2007. Matplotlib: A 2D graphics environment. Computing in
science & engineering 9, 3 (2007), 90–95.
Arshad Jamal, Jane Coughlan, and Muhammad Kamal. 2013. Mining social
network data for personalisation and privacy concerns: a case study of Facebook’s
Beacon. International Journal of Business Information Systems 13, 2 (2013), 173–
Eric Jones, Travis Oliphant, Pearu Peterson, et al
2001. SciPy: Open source
scientic tools for Python. (2001).
Ivar Jorstad, D Van Thanh, and Schahram Dustdar. 2005. The personalization of
mobile services. In WiMob’2005), IEEE International Conference on Wireless And
Mobile Computing, Networking And Communications, 2005., Vol. 4. IEEE, 59–65.
Evangelos Karapanos. 2015. Sustaining user engagement with behavior-change
tools. Interactions 22, 4 (2015), 48–52.
Mohammed Khwaja, Miquel Ferrer, Jesus Omana Iglesias, A Aldo Faisal, and
Aleksandar Matic. 2019. Aligning daily activities with personality: towards a
recommender system for improving wellbeing. In Proceedings of the 13th ACM
Conference on Recommender Systems. 368–372.
Mohammed Khwaja, Sumer S Vaid, Sara Zannone, Gabriella M Harari, A Aldo
Faisal, and Aleksandar Matic. 2019. Modeling personality vs. modeling personal-
idad: In-the-wild mobile data analysis in ve countries suggests cultural impact
on personality models. Proceedings of the ACM on Interactive, Mobile, Wearable
CHI ’21, May 8–13, 2021, Yokohama, Japan Pieritz and Khwaja et al
and Ubiquitous Technologies 3, 3 (2019), 1–24.
Seoyoung Kim, Arti Thakur,and Juho Kim. 2020. Understanding Users’ Perception
Towards Automated Personality Detection with Group-specic Behavioral Data.
In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems.
Younghwa Lee, Kenneth A Kozar, and Kai RT Larsen. 2003. The technology
acceptance model: Past, present, and future. Communications of the Association
for information systems 12, 1 (2003), 50.
Cong Li. 2016. When does web-based personalization really work? The distinction
between actual personalization and perceived personalization. Computers in
Human Behavior 54 (2016), 25–33.
Rui Neves Madeira, Helena Germano, Patrícia Macedo, and Nuno Correia. 2018.
Personalising the user experience of a mobile health application towards Patient
Engagement. Procedia Computer Science 141 (2018), 428–433.
Tamar R Makin, Frederique de Vignemont, and A Aldo Faisal. 2017. Neurocogni-
tive barriers to the embodiment of technology. Nature Biomedical Engineering 1,
1 (2017), 1–3.
Aleksandar Matic, Martin Pielot, and Nuria Oliver. 2017. " OMG! How did it
know that?" Reactions to Highly-Personalized Ads. In Adjunct Publication of the
25th Conference on User Modeling, Adaptation and Personalization. 41–46.
Jonathan Mayer, Patrick Mutchler, and John C Mitchell. 2016. Evaluating the
privacy properties of telephone metadata. Proceedings of the National Academy
of Sciences 113, 20 (2016), 5536–5541.
Patrick E McKight and Julius Najab. 2010. Kruskal-wallis test. The corsini
encyclopedia of psychology (2010), 1–1.
Patrick E McKnight and Julius Najab. 2010. Mann-Whitney U Test. The Corsini
encyclopedia of psychology (2010), 1–1.
Bjarke Mønsted, Anders Mollgaard, and Joachim Mathiesen. 2018. Phone-based
metric as a predictor for basic personality traits. Journal of Research in Personality
74 (2018), 16–22.
Rahul Mukerjee and CF Je Wu. 2007. A modern theory of factorial design.
Springer Science & Business Media.
Elizabeth Murray, Eric B Hekler, Gerhard Andersson, Linda M Collins, Aiden
Doherty, Chris Hollis, Daniel E Rivera, Robert West, and Jeremy C Wyatt. 2016.
Evaluating digital health interventions: key questions and approaches.
Gideon Nave, Juri Minxha, David M Greenberg, Michal Kosinski, David Stillwell,
and Jason Rentfrow. 2018. Musical preferences predict personality: evidence from
active listening and facebook likes. Psychological Science 29, 7 (2018), 1145–1158.
Seth M Noar, Christina N Benac, and Melissa S Harris. 2007. Does tailoring matter?
Meta-analytic review of tailored print health behavior change interventions.
Psychological bulletin 133, 4 (2007), 673.
World Health Organization et al
2020. Coronavirus disease 2019 (COVID-19):
situation report, 72. (2020).
Wei Peng, Shaheen Kanthawala, Shupei Yuan, and Syed Ali Hussain. 2016. A
qualitative study of user perceptions of mobile health apps. BMC public health
16, 1 (2016), 1–11.
Dorian Peters, Rafael A Calvo, and Richard M Ryan. 2018. Designing for motiva-
tion, engagement and wellbeing in digital experience. Frontiers in psychology 9
(2018), 797.
Betty Pfeerbaum and Carol S North. 2020. Mental health and the Covid-19
pandemic. New England Journal of Medicine (2020).
Stephen R Porter, Michael E Whitcomb, and William H Weitzer. 2004. Multiple
surveys of students and survey fatigue. New directions for institutional research
2004, 121 (2004), 63–73.
Daryl Pullman. 1999. The ethics of autonomy and dignity in long-term care.
Canadian Journal on Aging/La Revue canadienne du vieillissement 18, 1 (1999),
Cosima Rughiniş, Răzvan Rughiniş, and Ştefania Matei. 2015. A touching app
voice thinking about ethics of persuasive technology through an analysis of
mobile smoking-cessation apps. Ethics and Information Technology 17, 4 (2015),
Richard M Ryan and Edward L Deci. 2000. Self-determination theory and the
facilitation of intrinsic motivation, social development, and well-being. American
psychologist 55, 1 (2000), 68.
Richard M Ryan, C Scott Rigby, and Andrew Przybylski. 2006. The motivational
pull of video games: A self-determination theory approach. Motivation and
emotion 30, 4 (2006), 344–360.
Barry Schwartz. 2004. The paradox of choice: Why more is less. Ecco New York.
Saul Shiman, Arthur A Stone, and Michael R Huord. 2008. Ecological momen-
tary assessment. Annu. Rev. Clin. Psychol. 4 (2008), 1–32.
Janice C Sipior, Burke T Ward, and Linda Volonino. 2014. Privacy concerns
associated with smartphone use. Journal of Internet Commerce 13, 3-4 (2014),
Marcin Skowron, Marko Tkalčič, Bruce Ferwerda, and Markus Schedl. 2016.
Fusing social media cues: personality prediction from twitter and instagram. In
Proceedings of the 25th international conference companion on world wide web.
International World Wide Web Conferences Steering Committee, 107–108.
Jacopo Staiano, Bruno Lepri, Nadav Aharony, Fabio Pianesi, Nicu Sebe, and Alex
Pentland. 2012. Friends don’t lie: inferring personality traits from social network
structure. In Proceedings of the 2012 ACM conference on ubiquitous computing.
Amir Tal and John Torous. 2017. The digital mental health revolution: Opportu-
nities and risks. (2017).
Paul Voigt and Axel Von dem Bussche. 2017. The EU General Data Protection
Regulation (GDPR). A Practical Guide, 1st Ed., Cham: Springer International
Publishing (2017).
[61] Sarah Theres Völkel, Renate Haeuslschmid, Anna Werner, Heinrich Hussmann,
and Andreas Butz. 2020. How to Trick AI: Users’ Strategies for Protecting
Themselves from Automatic Personality Assessment. In Proceedings of the 2020
CHI Conference on Human Factors in Computing Systems. 1–15.
Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie
Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T Campbell. 2014. StudentLife:
assessing mental health, academic performance and behavioral trends of college
students using smartphones. In Proceedings of the 2014 ACM international joint
conference on pervasive and ubiquitous computing. 3–14.
Weichen Wang, Gabriella M Harari, Rui Wang, Sandrine R Müller, Shayan Mirja-
fari, Kizito Masaba, and Andrew T Campbell. 2018. Sensing Behavioral Change
over Time: Using Within-Person Variability Features from Mobile Sensing to
Predict Personality Traits. Proceedings of the ACM on Interactive, Mobile, Wearable
and Ubiquitous Technologies 2, 3 (2018), 141.
Nick Yee, Nicolas Ducheneaut, Les Nelson, and Peter Likarish. 2011. Introverted
elves & conscientious gnomes: the expression of personality in world of warcraft.
In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.
ACM, 753–762.
Charles Zaiontz. 2020. Real statistics using Excel. Computer software] http://www.
real-statistics. com/free-download (2020).
Markus Zanker, Laurens Rook, and Dietmar Jannach. 2019. Measuring the impact
of online personalisation: Past, present and future. International Journal of
Human-Computer Studies 131 (2019), 160–168.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
This article presents the first thematic review of the literature on the ethical issues concerning digital well-being. The term ‘digital well-being’ is used to refer to the impact of digital technologies on what it means to live a life that is good for a human being. The review explores the existing literature on the ethics of digital well-being, with the goal of mapping the current debate and identifying open questions for future research. The review identifies major issues related to several key social domains: healthcare, education, governance and social development, and media and entertainment. It also highlights three broader themes: positive computing, personalised human–computer interaction, and autonomy and self-determination. The review argues that three themes will be central to ongoing discussions and research by showing how they can be used to identify open questions related to the ethics of digital well-being.
Full-text available
Research on understanding, developing and assessing personalisation systems is spread over multiple disciplines and builds on methodologies and findings from several different research fields and traditions, such as Artificial Intelligence (AI), Machine Learning (ML), Human–Computer Interaction (HCI), and User Modelling based on (applied) social and cognitive psychology. The fields of AI and ML primarily focus on the optimisation of personalisation applications, and concentrate on creating ever more accurate algorithmic decision makers and prediction models. In the fields of HCI and Information Systems, scholars are primarily interested in the phenomena around the use and interaction with personalisation systems, while Cognitive Science (partly) delivers the theoretical underpinnings for the observed effects. The aim and contribution of this work is to put together the pieces about the impact of personalisation and recommendation systems from these different backgrounds in order to formulate a research agenda and provide a perspective on future developments.
Full-text available
This article reports the findings of AI4People, an Atomium—EISMD initiative designed to lay the foundations for a “Good AI Society”. We introduce the core opportunities and risks of AI for society; present a synthesis of five ethical principles that should undergird its development and adoption; and offer 20 concrete recommendations—to assess, to develop, to incentivise, and to support good AI—which in some cases may be undertaken directly by national or supranational policy makers, while in others may be led by other stakeholders. If adopted, these recommendations would serve as a firm foundation for the establishment of a Good AI Society.
Background: Against a long-term trend of increasing demand, the COVID-19 pandemic has led to a global rise in common mental disorders. Now more than ever, there is an urgent need for scalable, evidence-based interventions to support mental well-being. Objective: The aim of this proof-of-principle study was to evaluate the efficacy of a mobile-based app in adults with self-reported symptoms of anxiety and stress in a randomised control trial that took place during the first wave of the COVID-19 pandemic in the UK. Methods: Adults with mild to severe anxiety and moderate to high levels of perceived stress were randomised to either the intervention or control arm. Participants in the intervention arm were given access to the app, Foundations, for the duration of the 4-week study. All participants were required to self-report a range of validated measures of mental well-being (10-item Connor-Davidson Resilience scale [CD-RISC-10]; 7-item Generalised Anxiety Disorder scale [GAD-7]; Office of National Statistics Four Subjective Well-being Questions [ONS-4]; World Health Organisation-5 Well-Being Index [WHO-5]) and sleep (Minimal Insomnia Scale [MISS]) at baseline and weeks 2 and 4; and, in addition, on perceived stress weekly (10-item Perceived Stress Score [PSS]). Results: 136 participants completed the study and were included in the final analysis. The intervention group (n=62) showed significant improvements compared to the control group (n=74) on measures of anxiety (GAD-7 score, delta from baseline to week 2 in the intervention group: -1.35 [SD 4.43]; control group: -0.23 [SD 3.24]; t134= 1.71 , P=.04), resilience (CD-RISC score, delta from baseline to week 2 in the intervention group: 1.79 [± SD 4.08]; control group: -0.31 [± SD 3.16]; t134 -3.37, P<.001), sleep (MISS score, delta from baseline to week 2 in the intervention group: -1.16 [± SD 2.67]; control group: -0.26 [± SD 2.29]; t134= 2.13, P=.01), and mental well-being (WHO-5 score, delta from baseline to week 2 in the intervention group: 1.53 [5.30]; control group: -0.23 [± SD 4.20]; t134= -2.16, P=.02) within 2 weeks of using Foundations, with further improvements emerging at week 4. Perceived stress was also reduced within the intervention group, although the results did not reach statistical significance relative to the control group (PSS score, delta from baseline to week 2 in the intervention group: -2.94 [± SD 6.84]; control group: -2.05 [± SD 5.34]; t134= 0.84, P=.20). Conclusions: This study provides proof-of-principle that the digital mental health app, Foundations, can improve measures of mental well-being, anxiety, resilience, and sleep within 2 weeks of use, with greater effects after 4 weeks. It therefore offers potential as a scalable, cost-effective and accessible solution to enhance mental well-being, even during times of crisis such as the COVID-19 pandemic. Clinicaltrial:
Conference Paper
Recommender Systems have not been explored to a great extent for improving health and subjective wellbeing. Recent advances in mobile technologies and user modelling present the opportunity for delivering such systems, however the key issue is understanding the drivers of subjective wellbeing at an individual level. In this paper we propose a novel approach for deriving personalized activity recommendations to improve subjective wellbeing by maximizing the congruence between activities and personality traits. To evaluate the model, we leveraged a rich dataset collected in a smartphone study, which contains three weeks of daily activity probes, the Big-Five personality questionnaire and subjective wellbeing surveys. We show that the model correctly infers a range of activities that are 'good' or 'bad' (i.e. that are positively or negatively related to subjective wellbeing) for a given user and that the derived recommendations greatly match outcomes in the real-world.
Sensor data collected from smartphones provides the possibility to passively infer a user's personality traits. Such models can be used to enable technology personalization, while contributing to our substantive understanding of how human behavior manifests in daily life. A significant challenge in personality modeling involves improving the accuracy of personality inferences, however, research has yet to assess and consider the cultural impact of users' country of residence on model replicability. We collected mobile sensing data and self-reported Big Five traits from 166 participants (54 women and 112 men) recruited in five different countries (UK, Spain, Colombia, Peru, and Chile) for 3 weeks. We developed machine learning based personality models using culturally diverse datasets - representing different countries - and we show that such models can achieve state-of-the-art accuracy when tested in new countries, ranging from 63% (Agreeableness) to 71% (Extraversion) of classification accuracy. Our results indicate that using country-specific datasets can improve the classification accuracy between 3% and 7% for Extraversion, Agreeableness, and Conscientiousness. We show that these findings hold regardless of gender and age balance in the dataset. Interestingly, using gender- or age- balanced datasets as well as gender-separated datasets improve trait prediction by up to 17%. We unpack differences in personality models across the five countries, highlight the most predictive data categories (location, noise, unlocks, accelerometer), and provide takeaways to technologists and social scientists interested in passive personality assessment.
Conference Paper
Increasingly, algorithms are used to make important decisions across society. However, these algorithms are usually poorly understood, which can reduce transparency and evoke negative emotions. In this research, we seek to learn design principles for explanation interfaces that communicate how decision-making algorithms work, in order to help organizations explain their decisions to stakeholders, or to support users' "right to explanation". We conducted an online experiment where 199 participants used different explanation interfaces to understand an algorithm for making university admissions decisions. We measured users' objective and self-reported understanding of the algorithm. Our results show that both interactive explanations and "white-box" explanations (i.e. that show the inner workings of an algorithm) can improve users' comprehension. Although the interactive approach is more effective at improving comprehension, it comes with a trade-off of taking more time. Surprisingly, we also find that users' trust in algorithmic decisions is not affected by the explanation interface or their level of comprehension of the algorithm.