Conference PaperPDF Available

A Data-Driven Approach to Developing IoT Privacy-Setting Interfaces

Authors:

Abstract and Figures

User testing is often used to inform the development of user interfaces (UIs). But what if an interface needs to be developed for a system that does not yet exist? In that case, existing datasets can provide valuable input for UI development. We apply a data-driven approach to the development of a privacy-setting interface for Internet-of-Things (IoT) devices. Applying machine learning techniques to an existing dataset of users' sharing preferences in IoT scenarios, we develop a set of "smart" default profiles. Our resulting interface asks users to choose among these profiles, which capture their preferences with an accuracy of 82%---a 14% improvement over a naive default setting and a 12% improvement over a single smart default setting for all users.
Content may be subject to copyright.
A Data-Driven Approach to Developing IoT Privacy-Setting
Interfaces
Leave Authors Anonymous
for Submission
City, Country
e-mail address
ABSTRACT
User testing is often used to inform the development of
user interfaces (UIs). But what if an interface needs
to be developed for a system that does not yet exist?
In that case, existing datasets can provide valuable in-
put for UI development. We apply a data-driven ap-
proach to the development of a privacy-setting interface
for Internet-of-Things (IoT) devices. Applying machine
learning techniques to an existing dataset of users’ shar-
ing preferences in IoT scenarios, we develop a set of
“smart” default profiles. Our resulting interface asks
users to choose among these profiles, which capture their
preferences with an accuracy of 82%—a 14% improve-
ment over a naive default setting and a 12% improve-
ment over a single smart default setting for all users.
ACM Classification Keywords
H.5.m. Information Interfaces and Presentation (e.g.
HCI): Miscellaneous;
Author Keywords
Data-driven design; Internet of Things; Privacy
settings; Machine learning
INTRODUCTION
Under the moniker of ‘Internet of Things‘ (IoT), smart
connected devices are revolutionizing our everyday life,
just like smartphones did for cellphones. Smartphones,
however, have shown to increase users’ privacy concerns
[4], and the same may be true for IoT. Like smartphones,
IoT devices collect and store personal information to per-
sonalize the user experience, share it across other de-
vices, and/or sell it to third parties. Consequently, pre-
serving users’ privacy is a big concern that limits the
adoption of IoT devices [10].
Privacy is an inherent trade-off in IoT, because IoT de-
vices cannot provide their services without collecting
data. Preserving users’ privacy therefore means giving
ACM ISBN 978-1-4503-2138-9.
DOI: 10.1145/1235
them control over this trade-off, by allowing them to
decide what information can be collected about them.
Outside the home environment, people have little control
over the data IoT devices collect. Researchers at Intel
are working on a framework that allows people to be no-
tified about surrounding IoT devices collecting personal
information, and to control these collection practices [5].
Smartphones give users control over their privacy set-
tings in the form of prompts that ask whether the user
allows or denies a certain app access to a certain type
of information. Such prompts are problematic for IoT,
because IoT devices are supposed to operate in the back-
ground. Moreover, as the penetration of IoT devices in
our environment continues to increase, prompts would
become a constant noise which users will soon start to
ignore, like software EULAs [8] or privacy policies [12].
A better solution would be to regulate privacy with
global settings. But research has shown that users are
highly concerned about their privacy, but find it diffi-
cult to implement privacy settings [1, 9, 19]. Indeed, the
vast number of encounters people have with a myriad
of different IoT devices makes chosing adequate privacy
settings a very challenging task that is likely to result in
information and choice overload [28].
Data-driven design
What design process allows us to develop a usable
privacy-setting interface for IoT? The development of us-
able privacy interfaces commonly relies on user studies
with existing systems. However, this method is not pos-
sible in our IoT control scenario, because the Intel con-
trol framework has yet to be implemented [5]. We there-
fore develop and employ a data-driven design methodol-
ogy, leveraging an existing dataset collected by Lee and
Kobsa [16], who asked users whether they would allow
or deny IoT devices in their environment to collect infor-
mation about them. We use this dataset in two phases.
In our first phase, we develop a “layered” settings in-
terface, where users make a decision on a less granular
level (e.g., whether a certain recipient is allowed to col-
lect their personal information or not), and only move
to a more granular decision (e.g., what types of informa-
tion this recipient is allowed to collect) when they desire
more detailed control. This reduces the complexity of
the decisions users have to make, without reducing the
amount of control available to them. We use statistical
analysis of the Lee and Kobsa dataset to decide which
aspect should be presented at the highest layer of our
IoT privacy-setting interface, and which aspects are rel-
egated to subsequently lower layers.
In our second phase, we develop a “smart” default set-
ting, which preempts the need for many users to man-
ually change their settings [26]. However, since people
differ extensively in their privacy preferences [20], it is
not possible to achieve an optimal default that is the
same for everyone. Instead, different people may require
different settings. Outside the field of IoT, researchers
have been able to establish distinct clusters or “profiles”
based on user behavioral data [14, 20, 29]. We perform
machine learning analysis on the Lee and Kobsa dataset
to create a similar set of “smart profiles” for our IoT
privacy-setting interface.
The remainder of this paper is structured as follows: We
first summarize previous work on privacy in IoT scenar-
ios, and describe the structure of the Lee and Kobsa [16]
dataset. We then inspect users’ behaviors using statis-
tical analysis. Next, we predict users’ behaviors using
machine learning methods. We subsequently present a
set of prototypes for an IoT privacy-setting interface.
Finally, we conclude with a summary of our proposed
procedure and the results of our analysis.
APPROACH AND RELATED WORK
Our goal is to develop intuitive interfaces for IoT privacy
settings, using a data-driven approach. In this section
we therefore discuss existing research on privacy-setting
interfaces and on privacy prediction.
Privacy-Setting Interfaces
The most basic privacy-setting interface is the tradi-
tional “access control matrix”, which allows users to in-
dicate who gets to see what [25]. This approach can be
further simplified by grouping recipients into relevant se-
mantic categories, such as Google+’s circles [27]. Tak-
ing a step further, Raber et al. [22] proposed Privacy
Wedges to manipulate privacy settings. Privacy Wedges
allow users to make privacy decisions using a combina-
tion of semantic categorization (the various wedges) and
inter-personal distance (the position of a person on the
wedge). Users can decide who gets to see various posts or
personal information by “coloring” parts of each wedge.
Privacy wedges have been tested on limited numbers of
friends, and in the case of IoT they are likely to be in-
sufficient, due to the complexity of the decision space.
To wit, IoT privacy decisions involve a large selection
of devices, each with various sensors that collect data
for a range of different purposes. This makes it com-
plicated to design an interface that covers every possi-
ble setting [28]. A wedge-based interface will arguably
not be able to succinctly represent such complexity, and
therefore either be impossible, or still lead to a signifi-
cant amount of information and choice overload.
We propose a data-driven approach to solve this prob-
lem: statistical analysis informs the construction of a
layered settings interface, while machine learning-based
privacy prediction helps us find smart privacy profiles.
Privacy Prediction
Several researchers have proposed privacy prediction as
a solution to the privacy settings complexity problem.
Sadeh et al. used a k-nearest neighbor algorithm and a
random forest algorithm to predict users’ privacy pref-
erences in a location-sharing system [24], based on the
type of recipient and the time and location of the re-
quest. They demonstrated that users had difficulties
setting their privacy preferences, and that the applied
machine learning techniques can help users to choose
more accurate disclosure preferences. Similarly, Pallapa
et al. [21] present a system which can determine the re-
quired privacy level in new situations based on the his-
tory of interaction between users. Their system can ef-
ficiently deal with the rise of privacy concerns and help
users in a pervasive system full of dynamic interactions.
Dong et al. [6] use a binary classification algorithms
to give users personalized advice regarding their pri-
vacy decision-making practices on online social networks.
They found that J48 decision trees provided the best re-
sults. Li and et al. [17] similarly use J48 to demonstrate
that taking the user’s cultural background into account
when making privacy predictions improves the predic-
tion accuracy. Our data stems from a culturally homo-
geneous population (U.S. Mechanical Turk workers), so
cultural variables are outside the scope of our study. We
do however follow these previous works in using J48 de-
cision trees in our prediction approach.
We further extend our approach using clustering to find
several smart default policies (“profiles”). This is in line
with Fang et al. [7], who present an active learning al-
gorithm that comes up with privacy profiles for users
in real time. Since our approach is based on an exist-
ing dataset, our algorithm does not classify users in real
time, but instead creates a static set of profiles ‘offline’,
from which users can subsequently choose. This avoids
cold start problems, and does not rely on the availability
of continuous real-time behaviors. This is beneficial for
IoT settings, because users often specify their settings
in these systems in a “single shot”, leaving the settings
interface alone afterwards.
Ravichandran et al. [23] employ an approach similar to
ours, using k-means clustering on users’ contextualized
location sharing decisions to come up with several de-
fault policies. They showed that a small number of de-
fault policies could accurately reflect a large part of the
location sharing preferences. We extend their approach
to find the best profiles based on various novel cluster-
ing approaches, and take the additional step of designing
user interfaces that incorporate the best solutions.
We apply our procedure to a dataset by Lee and
Kobsa [16], who presented users with a total of 2800
IoT usage scenarios that were systemstically manipu-
lated along five dimensions (see next section). Using this
dataset, Lee and Kobsa observed that these scenarios can
be grouped into four clusters in terms of potential pri-
vacy risks. The subsequent clusters differ substantially
along several dimensions, most notably regarding the in-
quirer (the ‘who’) and data type (the ‘what’). The dom-
inance of the ‘who’ parameter is also reflected in a study
in a ubiquitous computing environment by Lederer et
al. [15]. Extending upon Lee and Kobsa, our clustering
procedure is performed at the user level rather than the
scenario level. This allows us to create privacy profiles.
DATASET
This study is based on a dataset collected by Lee and
Kobsa [16]. A total of 2800 scenarios were presented
to 200 participants (100 male, 99 female, 1 undisclosed)
through Amazon Mechanical Turk. Four participants
were between 18 and 20 years old, 75 between 20 and
30, 68 between 30 and 40, 31 between 40 and 50, 20
between 50 and 60, and 2 were older than 60.
Each participant was presented with 14 scenarios de-
scribing a situation where an IoT device would collect
information about the participant. Each scenario was
a combination of five contextual parameters (Table 1),
manipulated at several levels using a mixed fractional
factorial design that allowed us to test main effects and
two-way interactions between all parameters.
For every scenario, participants were asked a total of 9
questions. Our study focuses on the allow/reject ques-
tion: “If you had a choice to allow/reject this, what
would you choose?”, with answer options “I would allow
it” and “I would reject it”. We also used participants’
answers to three attitudinal questions regarding the sce-
nario:
Risk: How risky or safe is this situation? (7pt scale
from very risky to very safe)
Comfort: How comfortable or uncomfortable do you
feel about this situation? (7pt scale from very uncom-
fortable to very comfortable)
Appropriateness: How appropriate do you consider
this situation? (7pt scale from very inappropriate to
very appropriate)
INSPECTING USERS’ BEHAVIORS
In this section we analyze how users’ behavioral
intentions—specifically, whether they would allow or
reject the information collection described in the
scenario—–are influenced by the scenario parame-
ters. In line with classic attitude-behavior models [2],
we also investigate whether users’ attitudes regard-
ing the scenario—their judgment of risk, comfort, and
appropriateness—mediate these effects. Our statistical
analysis tests a mediation model between the scenario
parameters, attitudes, and behavioral intentions. This
mediation analysis [3] involves the following test:
Table 1: Parameters used in the experiment. Example
scenario: “A device of a friend records your video to
detect your presence. This happens continuously, while
you are at someone else’s place, for your safety.”
Parameter Levels
Who
The entity collecting
the data
1. Unknown
2. Colleague
3. Friend
4. Own device
5. Business
6. Employer
7. Government
What
The type of data
collected and
(optionally) the
knowledge extracted
from this data
1. PhoneID
2. PhoneID>identity
3. Location
4. Location>presence
5. Voice
6. Voice>gender
7. Voice>age
8. Voice>identity
9. Voice>presence
10. Voice>mood
11. Photo
12. Photo>gender
13. Photo>age
14. Photo>identity
15. Photo>presence
16. Photo>mood
17. Video
18. Video>gender
19. Video>age
20. Video>presence
21. Video>mood
22. Video>looking at
23. Gaze
24. Gaze>looking at
Where
The location of the
data collection
1. Your place
2. Someone else’s place
3. Semi-public place (e.g.
restaurant)
4. Public space (e.g. street)
Reason
The reason for
collecting this data
1. Safety
2. Commercial
3. Social-related
4. Convenience
5. Health-related
6. None
Persistence 1. Once
2. Continuously
Whether data is
collected once or
continuously
Test 1: The effect of the scenario parameters (who,
what, where, reason, persistence) on participants’ at-
titudes (risk, comfort, appropriateness).
Test 2: The effect of participants’ attitudes on their
behavioral intentions (the allow/reject decision).
Test 3: The effect of the parameters on behavioral
intentions, controlling for attitudes.
If tests 1 and 2 are significant, and test 3 reveals a sub-
stantial reduction in conditional direct effect (compared
to the marginal effect), then we can say that the effects
of the scenario parameters on participants’ behavioral
intention are mediated by their attitudes. Moreover, if
the conditional direct effect is (close to) zero, then the
effects are fully (rather than partially) mediated.
Scenario Parameters and Attitude
ANOVA Test of Main Effects
To understand the effect of the scenario parameters on
participants’ attitudes, we created linear mixed effects
regression (lmer) models with a random intercept to ac-
count for repeated measures on the same participant.
We considered separate models for each of the dependent
variables (risk, comfort, appropriateness), using the sce-
nario parameters as independent variables. We employ
a forward stepwise regression procedure to include the
strongest remaining parameter into the model at each
step, comparing each model against the previous model.
Table 2 shows that all scenario parameters except where
have a significant effect on each of the attitudes.
Post-hoc Comparisons
We also conducted Tukey post hoc analyses to better
understand how the various values of each parameter in-
fluenced the attitudes. Where was excluded from these
analyses, as it did not have an overall significant effect.
Some key findings of these post hoc analyses are:
Who: Participants perceive more risk when the recip-
ient of the information is ‘unknown’ than for any other
recipient (drange = [0.640, 1.450] and all ps<.001,
except for ‘government’: d= 0.286, p < .05). ‘Gov-
ernment’ is the next most risky recipient (drange =
[0.440, 1.190], all ps<.001). Participants consider their
‘own device’ the least risky (drange = [0.510, 1.450], all
ps<.001). Similar patterns were found for comfort and
appropriateness.
Reason: Participants were more comfortable disclos-
ing information for the purpose of ‘safety’ than for any
other reason except ‘health’ (drange = [0.230, 0.355], all
ps<.05). They also believe that disclosing information
for the purpose of ‘health’ or ‘safety’ is more appropri-
ate than for ‘social’ or ‘commercial’ purposes (drange =
[0.270, 0.310], all ps<.05).
Persistence: Participants were more comfortable,
found it more appropriate, and less risky to disclose their
information ‘once’ rather than ‘continuously’ (d= 0.146,
p<.01).
Table 2: Effect of scenario on attitudes. Each model
builds upon and is tested against the previous.
Model χ2df p-value
risk (1|sid)
+who 315.37 6 <.0001
+what 67.74 23 <.0001
+reason 15.65 5 .0079
+persistence 9.95 1 .0016
+where 7.47 3 .0586
+who:what 166.47 138 .0050
Model χ2df p-value
comfort (1|sid)
+who 334.06 6 <.0001
+what 83.24 23 <.0001
+reason 18.68 5 .0022
+persistence 14.73 1 .0001
+where 3.25 3 .3544
+who:what 195.07 138 .0001
Model χ2df p-value
appropriateness (1|sid)
+who 315.77 6 <.0001
+what 72.87 23 <.0001
+reason 23.27 5 .0003
+persistence 8.97 1 .0027
+where 5.46 3 .1411
+who:what 214.61 138 <.0001
What: This parameter has a large number of values, so
we decided to selectively test planned contrasts instead
of post-hoc tests. We first compared different mediums
(voice, photo, video) regardless of what is being inferred:
Participants were significantly more comfortable with
‘voice’ than ‘video’ (d= 0.260, p=.005), and found
‘voice’ less risky (d=0.239, p=.005) and more
appropriate (d= 0.217, p=.015) than ‘video’.
Participants were significantly more comfortable with
‘voice’ than ‘photo’ (d= 0.201, p=.007) and found
‘voice’ more appropriate than ‘photo’ (d= 0.157,
p=.028). There was no significant difference in terms
of risk (p=.118).
No differences were found between ‘photo’ and ‘video’
in terms of risk (p=.24), comfort (p=.35) and
appropriateness (p=.26).
We also compared different inferences (e.g. age, gender,
mood, identity) across mediums. The following planned
contrasts were significant (all others were not):
Participants were significantly more comfortable
(d= 0.363, p=.028) and found it more appropri-
ate (d= 0.371, p=.018) to reveal their ‘age’ rather
than their ‘identity’.
Participants were significantly more comfortable
(d= 0.363, p=.008) and found it more appropri-
ate (d= 0.308, p=.024) to reveal their ‘presence’
rather than their ‘identity’.
Table 3: Effect of attitudes and scenario on allow/reject.
Model OR χ2df p-value
allow (1|sid)
+risk 0.25 1005.24 1 <.0001
+comfort 5.04 723.27 1 <.0001
+appropriateness 3.47 128.17 1 <.0001
+who 8.80 6 .1851
+what 26.07 23 .2976
+reason 19.33 5 .0017
+persistence 12.69 1 .0004
Interaction effects
We also checked for two-way interactions between the
scenario parameters. The only significant interaction ef-
fect observed was between who and what. The last line
of each section in Table 2 shows the results of adding
this interaction to the model. Due to space concerns,
we choose not to address the post-hoc analysis of the
724 = 168 specific combinations of who and what.
Attitude and Behavioral intention
To test the effects of participants’ attitudes on their in-
tention to allow or reject the scenario, we created a gen-
eralized linear mixed effects regression (glmer ) model
with a random intercept to account for repeated mea-
sures on the same participant, and a logit link function to
account for the binary dependent variable. We introduce
the attitudinal variables (risk, comfort, appropriateness)
as predictors in a forward stepwise fashion.
We found significant effects of all the three attitudinal
factors on participants’ intention to allow or reject the
information collection (see Table 3). Each 1-point in-
crease in risk results in a 4.04-fold decrease in the odds
that the scenario will be allowed (p<.0001). Each 1-
point increase in comfort results in a 5.04-fold increase
(p<.0001), and each 1-point increase in appropriate-
ness results in a 3.47-fold increase (p<.0001).
Mediation Analysis
The bottom half of Table 3 shows the conditional ef-
fects of the significant scenario parameters (who, what,
reason, persistance) on participants’ intention to allow
or reject the scenario, controlling for the attitudinal fac-
tors. Who and what are not significant, which suggests
that these effects are fully mediated by the attitudinal
factors. The effects of reason and persistance are still
significant, but smaller than the marginal effects (i.e.,
without controlling for attitude, see Table 4)—their χ2s
are reduced by 12% and 39%, respectively. This means
that the mediation effect was substantial in all cases.
The final mediation model is displayed in Figure 1.
Discussion of Statistical Results
Our statistical results show several patterns that can in-
form the development of an IoT privacy-setting inter-
face. We find that who is the most important scenario
parameter, and should thus end up at the top layer of our
Table 4: Effect of attitudes on allow/reject, not control-
ling for scenario.
Model χ2df p-value
allow (1|sid)
+who 221.36 6 <.0001
+what 78.55 23 <.0001
+reason 21.95 5 .0005
+persistence 20.64 1 <.0001
WHO
WHAT
PERSISTENCE
REASON
RISK
COMFORT
APPROP
Behavioral Intention
(allow v/s reject)
See Table-4
Figure 1: Mediation model of the effect of scenario pa-
rameters on participants’ intention to allow/reject the
scenario, mediated by attitudinal factors
interface. People are generally concerned about IoT sce-
narios involving unknown and government devices, but
less concerned about about data collected by their own
devices. Mistrust of government data collection is in line
with Li et al.’s finding regarding US audiences [17].
What is the next most important scenario parameter,
and its significant interaction with who suggests that
some users may want to allow/reject the collection of
different types of data by different types of recipients.
Privacy concerns are higher for photo and video than
for voice, arguably because photos and videos are more
likely to reveal the identity of a person. Moreover, people
are less concerned with revealing their age and presence,
and most concerned with revealing their identity.
The reason for the data collection may be used as the
next layer in the interface. Health and safety are gener-
ally seen as acceptable reasons. Persistence is less im-
portant, although one-time collection is more acceptable
than continuous collection. Where the data is being
collected does not influence intention at all. This could
be an artifact of the dataset: location is arguably less
prominent when reading a scenario than it is in real life.
Finally, participants’ attitudes significantly (and in some
cases fully) mediated the effect of scenario parameters on
behavioral intentions. This means that these attitudes
may be used as a valuable source for classifying people
into distinct groups. Such attitudinal clustering could
capture a significant amount of the variation in partic-
ipants in terms of their preferred privacy settings, esp-
cially with respect to the who and what dimensions.
Table 5: Comparison of clustering approaches
Approach clusters Accuracy # of profiles
Naive
classification
1 28.33% 1 (all ‘yes’)
1 71.67% 1 (all ‘no’)
Overall 1 73.10% 1
Attitude-
based
clustering
2 75.28% 2
3 75.17% 3
4 75.60% 3
5 75.25% 3
Fit-based
clustering
2 77.99% 2
3 81.54% 3
Agglomerative
clustering
200 78.13% 4
200 78.27% 5
PREDICTING USERS’ BEHAVIORS
In this section we predict participants’ allow/reject deci-
sion using machine learning methods. Our goal is to find
a suitable default setting for an IoT privacy-setting inter-
face. Consequently, we do not attempt to find the best
possible solution; instead we make a conscious tradeoff
between parsimony and prediction accuracy.
Our prediction target is the participants’ decision to al-
low or reject the data collection described in each sce-
nario, classifying a scenario as either ‘yes’ or ‘no’. The
scenario parameters serve as input attributes. These are
nominal variables, making decision tree algorithms such
as ID3 and J48 a suitable prediction approach. Unlike
ID2, J48 uses gain ratio as the root node selection metric,
which is not biased towards input attributes with many
values. We therefore use J48 throughout our analysis.
We discuss progressively sophisticated methods for pre-
dicting participants’ decisions. After discussing naive
solutions, we first present a cross-validated tree learning
solution that results in a single “smart default” setting
that is the same for everyone. Subsequently, we dis-
cuss three different procedures that create a number of
“smart profiles” by clustering the participants and cre-
ating a separate cross-validated tree for each cluster. For
each procedure, we try various numbers of clusters. Ac-
curacies of the resulting solutions are reported in Table 5.
Naive Prediction Methods
We start with naive or “information-less” predictions.
Our dataset contains 793 ‘yes’es and 2007 ‘no’s. There-
fore, predicting ‘yes’ for every scenario gives us a 28.32%
prediction accuracy, while making a ‘no’ prediction gives
us an accuracy of 71.67%. In other words, if we disallow
all information collection by default, users will on aver-
age be happy with this default for 71.67% of the settings.
Overall Prediction
We next create a “smart default” by predicting the al-
low/reject decision with the scenario parameters using
J48 with Weka’s [11] default settings. The resulting tree
(Figure 2) has an accuracy of 73.10%. The confusion
matrix (Table 6) shows that this model results in overly
conservative settings; only 208 ‘yes’es are predicted.
Table 6: Confusion matrix for the overall prediction
Prediction Observed Total
Yes No
Yes 124 (TP) 669 (FN) 793
No 84 (FP) 1923 (TN) 2007
Total 208 2592 2800
5/29/2017 localhost:63342/d3_paper/index.html?_ijt=ca80k1g3211vr31sjmg5cihfii
http://localhost:63342/d3_paper/index.html?_ijt=ca80k1g3211vr 31sjmg5cihfii 1/1
WHO
Unknown :NO
Collea gue:NO
Friend:NO
Owndevice:WHAT
Business:NO
Employer:NO
Government:NO
Figure 2: The Overall Prediction decision tree. Further
drill down for who = ‘Own device’ is provided in Table 7
Figure 2 shows that this model predicts ‘no’ for every
recipient (who) except ‘Own device’. For this value, the
default setting depends on what is being collected (see
Table 7). For some levels of what, there is a further
drill down based on where,persistence and reason.
We can use this tree to create a “smart default” setting;
in that case, users would on average be content with
73.10% of these settings—a 2% improvement over the
naive “no to everything” default setting.
Given that people differ substantially in their privacy
preferences, it is not unsurprising that this “one size fits
all” default setting is not very accurate. A better solu-
tion would cluster participants by their privacy prefer-
ences, and then fit a separate tree for each cluster. These
trees could then be used to create “smart profiles” that
new users may choose from. Subsequent sections discuss
several ways of creating such profiles.
Attitude-Based Clustering
Our first “smart profile” solution uses the attitudes
(comfort, risk, appropriateness) participants expressed
for each scenario on a 7-point scale. We averaged the
values per attitude across each participant’s 14 answers,
and ran k-means clustering on that data with 2, 3, 4 and
5 clusters. We then added participants’ cluster assign-
ments to our original dataset, and ran the J48 decision
tree learner on the dataset with the additional cluster
attribute. Accuracies of the resulting solutions are re-
ported in Table 5 under “attitude-based clustering”.
All of the resulting trees had cluster as the root node.
This indicates that this parameter is a very effective pa-
rameter for predicting users’ decisions. This also allows
us to split the trees at the root node, and create separate
default settings for each cluster.
The 2-cluster solution (Figure 3) has a 75.28%
accuracy—a 3.0% improvement over the “smart de-
fault”. This solution results in one profile with ‘no’ for
Table 7: Drill down of the Overall Prediction tree for
who = ‘Own device’
What Decision
PhoneID Yes
PhoneID>identity Yes
Location No
Location>presence Reason
Safety Yes
Commercial Yes
Social-related No
Convenience No
Health-related Yes
None Yes
Voice No
Voice>gender Where
Your place No
Someone else No
Semi-public No
Public Yes
Voice>age No
Voice>identity Yes
Voice>presence Yes
Voice>mood Yes
Photo No
Photo>gender No
Photo>age No
Photo>identity Yes
Photo>presence No
Photo>mood No
Video No
Video>gender No
Video>age No
Video>presence No
Video>mood Yes
Video>looking at Persistence Once Yes
Continuous No
Gaze No
Gaze>looking at Reason
Safety Yes
Commercial No
Social-related No
Convenience Yes
Health-related Yes
None Yes
9/19/2017 localhost:63342/d3_paper/index.html?_ijt=1v9fekoi78r3ngd2b0ldg171ne
http://localhost:63342/d3_paper/index.html?_ijt=1v9fekoi78r3ngd2b0ldg171ne 1/1
CLUSTER
Cluster 0 (89 users):
Cluster 1 (111 users):
WHO
NO
Unknown: NO
Colleague: NO
Friend: WHAT
Own device: YES
Business: NO
Employer: WHAT
Government: NO
Figure 3: Attitude-based clustering: 2-cluster tree. Fur-
ther drill down for who = ‘Friend’ or ‘Employer/School’
in Cluster 0 is hidden for space reasons.
everything, while for the other profile the decision de-
pends on the recipient (who). This profile allows any
collection involving the user’s ‘Own device’, and may
allow collection by a ‘Friend’ or an ‘Employer/School’,
depending on what is being collected.
The 3-cluster solution has a slightly lower accuracy of
75.17%, but is more parsimonious than the 2-cluster so-
lution. There is one profile with ‘no’ for everything, one
profile that allows collection by the user’s ‘Own device’
only, and one profile that allows any collection except
when the recipient is ‘Unknown’ or the ‘Government’.
The 4- and 5-cluster solutions have several clusters with
the same sub-tree, and therefore reduce to a 3-cluster
solution with 75.60% and 75.25% accuracy, respectively.
Fit-based clustering
Our fit-based clustering approach clusters participants
without using any additional information. It instead uses
the fit of the tree models to bootstrap the process of sort-
ing participants into clusters. Like many bootstrapping
methods, ours uses random starts and iterative improve-
ments to find the optimal solution.
Random starts: We randomly divide particpants over
Nseparate groups, and learn a tree for each group. This
is repeated until a non-trivial starting solution (i.e., with
distinctly different trees per cluster) is found.
Iterative improvements: Once each of the Ngroups
has a unique decision tree, we evaluate for each partici-
pant which of the trees best represents their 14 decisions.
If this is the tree of a different group, we switch the par-
ticipant to this group. Once all participants are evalu-
ated and put in the group of their best-fitting tree, the
tree in each group is re-learned with the data of the new
group members. This then prompts another round of
evaluations, and this process continues until no further
switches are performed.
Since this process is influenced by random chance, it
is repeated in its entirety to find the optimal solution.
Cross-validation is performed in the final step to prevent
over-fitting. Accuracies of the 2- and 3-cluster solutions
are reported in Table 5 under “fit-based clustering”. We
were not able to converge on a higher number of clusters.
The 2-cluster solution has a 77.99% accuracy—a 6.7%
improvement over the “smart default”. One profile has
‘no’ for everything, while the settings in the other profile
depends on who: it allows any collection by the user’s
‘Own device’, and may allow collection by a ‘Friend’s de-
vice’ or an ‘Employer’, depending on what is collected.
The 3-cluster solution (Figure 4) has a 81.54%
accuracy—an 11.5% improvement over the “smart de-
fault”. We find one profile with ‘no’ for everything; one
profile that may allow collection by the user’s ‘Own de-
vice’, depending on what is being collected; and one pro-
file that allows any collection except when the recipient
(who) is ‘Unknown’, the ‘Government’, or a ‘Colleague’,
with settings for the latter depending on the reason.
5/30/2017 localhost:63342/d3_paper/index.html?_ijt=aqakdfb7kso4nabl9rsblaap7t
http://localhost:63342/d3_paper/index.html?_ijt=aqakdfb7kso4nabl9rsblaap7t 1/2
CLUSTER
Cluster0(74users):
Cluster1(77users):
Cluster2(49users):
NO
WHO
WHO
Unknown :NO
Collea gue:NO
Friend:NO
Owndevice:WHAT
Business:NO
Employer:NO
Government:NO
Unknown :NO
Collea gue:REASON
Friend:YES
Owndevice:YES
Business:YES
Employer:YES
Government:NO
PhoneID:YES
PhoneID>id entity:YES
Location:PERSISTENCE
Location>p resence:YES
Voice:NO
Voice>gende r:YES
Voice>age:YES
Voice>identity:YES
Voice>presence :YES
Voice>mood:YES
Photo:YES
Photo>gend er:WHERE
Photo>age:NO
Photo>iden tity:YES
Photo>presen ce:NO
Photo>mood:NO
Video:NO
Video>gen der:NO
Video>age :YES
Video>prese nce:NO
Video>mood :YES
Video>loo kingat:PERSISTENCE
Gaze:PERSISTENCE
Gaze>lookin gat:YES
Safetypurposes:YES
Commercial purposes:NO
Socialrel atedpurposes:YES
YourConvenience :YES
Healthrel atedpurposes:WHERE
None:NO
Figure 4: Fit-based clustering: 3-cluster tree. Further drill down is hidden for space reasons.
Agglomerative clustering
Our final method for finding “smart profiles” follows a
hierarchical bottom-up (or agglomerative) approach. It
first fits a separate decision tree for each participant, and
then iteratively merges these trees based on similarity.
156 of the initial 200 trees predict “no for everything”
and 34 of them predict “yes for everything” trees—these
are grouped together first. For every possible pair of the
remaining 10 trees, the accuracy of the pair is compared
with the mean of the accuracy of each individual tree,
and the pair with the smallest reduction in accuracy is
merged. This process is repeated until we reach the pre-
defined number of clusters.
We were able to merge clusters down to a 5- and 4-
cluster solution. The 3-cluster solution collapsed down
into a 2-cluster solution with one profile of all ‘yes’es
and one profile of all ‘no’s (a somewhat trivial solution
with a relatively bad fit). Accuracies of the 4- and 5-
cluster (Table 5, “agglomerative clustering”) are 78.27%
and 78.13% respectively. For the 4-cluster solution, we
find one profile with ‘no’ for everything, one profile with
‘yes’ for everything, one profile that depends on who,
and another that depends on what. The latter two pro-
files drill down even further on specific values of who
and what, respectively.
Discussion of Machine Learning Results
A comparison of the accuracies of the presented ap-
proaches is shown in Figure 5. Compared to a naive
default setting (all ‘no’), a “smart default” makes a
2.0% improvement. The fit-based 2-cluster solution re-
sults in two “smart profiles” that make another 6.7%
70 71 72 73 74 75 76 77 78 79 80 81 82 83
Agglo me rati ve , (5 )
Agglo me rati ve , (4 )
Fit,(3)
Attit ud e ,( 3 )
Fit,(2)
Attit ud e ,( 2 )
Naïve,(1)
Acc urac y, ( %)
Overvi e w,of,mod e l,acc urac ies
Figure 5: Accuracy of our clustering approaches
improvement over the “smart default”, while the three
“smart profiles” of the fit-based 3-cluster solution make
an 11.5% improvement. If we let users choose the best
option among these three profiles, they will on average
be content with 81.54% of the settings. This rivals the
accuracy of some of the “active tracking” machine learn-
ing approaches (cf. [24]).
In line with our statistical results, the factor who seems
to be the most prominent parameter in our profiles, fol-
lowed by what. In some cases the settings are more
complex, depending on a combination of who and what.
This is in line with the interaction effect observed in our
statistical results.
Even our most accurate solution is not without fault,
and its accuracy depends most on the who parameter.
Specifically, the solution is most accurate for the user’s
own device, the device of a friend, and when the recip-
ient is unknown. It is however less accurate when the
recipient is a colleague, a nearby business, an employer,
or the government. In these scenarios, more misclassifi-
cations tend to happen, so it would be useful to ‘guide’
users to specifically have a look at these default settings,
should they opt to make any manual overrides.
PRIVACY-SETTING PROTOTYPES
Designers of IoT privacy-setting interfaces face a diffi-
cult challenge. Since there currently exists no system
for setting one’s privacy preferences for public IoT sce-
narios, designers of such an interface must rely on ex-
isting data such as the Lee and Kobsa [16] dataset to
inform the design of these interfaces. Moreover, even for
the simplified scenario-based examples in this dataset,
a privacy-setting interface will likely be complex, as it
requires users to navigate settings for 7 types of recipi-
ents (who), 24 types of information (what), 4 different
locations (where), 6 different purposes (reason), and
decide whether they want to allow the collection once or
continuously (persistence). In this section we employ
our data-driven design methodology to develop a proto-
type for an IoT privacy-setting interface based on the
results of our statistical and machine learning analyses.
Manual Settings
The first challenge is to design an interface that users
can navigate manually. Using the results of our statis-
tical analyses, we design a “layered” settings interface:
users can make a decision based on a single parameter
only, and choose ‘yes’, ‘no’, or ‘it depends’ for each pa-
rameter value. If they choose ‘it depends’, they move to
a next layer, where the decision for that parameter value
is broken down by another parameter.
The manual interface is shown in Screens 2-4 of Figure 6.
At the top layer of this interface should be the scenario
parameter that is most influential in our dataset. Our
statistical results inform us that this is the who param-
eter. Screen 2 shows how users can allow/reject data
collection for each of the 7 types of recipients. Users can
choose “more”, which brings them to the second-most
important scenario parameter, i.e. the what parame-
ter. Screen 3 shows the data type options for when the
user clicks on “more” for “Friends’ devices”. We have
conveniently grouped the options by collection medium.
Users can turn the collection of various data types by
their friends’ devices on or off. If only some types of data
are allowed, the toggle at the higher level gets a yellow
color and turns to a middle option, indicating that it is
not completely ‘on’ (see “Friends’ devices” in Screen 2).
Screen 4 shows how users can drill down even further
to specify reasons for which collection is allowed, and
the allowed persistence (we combined these two pa-
rameters in a single screen to reduce the “depth” of our
interface). Since reason and persistence explain rela-
tively little variance in behavioral intention, we expect
that only a few users will go this deep into the inter-
face for a small number of their settings. We leave out
where altogether, because our statistical results deemed
this parameter to be non-significant.
Smart Default Setting
The next challenge is to decide on a default setting, so
that users only have to make minimal adjustments to
their settings. We can use a simple “yes to everything”
or “no to everything” default, but these defaults are on
average only accurate 28.33% and 71.67% of the time, re-
spectively. Using the results from our Overall Prediction
(see Figure 2), we can create a “smart default” setting
that is 73.67% accurate on average. In this version, the
IoT settings for all devices are set to ‘off’, except for
‘My own device’, which will be set to the middle option.
Table 7 shows the default settings at deeper levels.
As this default setting is on average only 73.67% accu-
rate, we expect users to still change some of their set-
tings. They can do this by simply navigating the inter-
face presented in Figure 6.
Smart Profiles
To improve the accuracy of the default setting, we can
instead build two “smart profiles”, and allow the user
to choose among them. Using the 3-cluster solution of
the fit-based approach (see Figure 4), we can attain an
accuracy of 81.54%.
Screen 1 in Figure 6 shows a selection screen where the
user can choose between these three profiles. The “Lim-
ited collection” profile allows the collection of any infor-
mation by the user’s own devices, their friends’ devices,
their employer/school’s devices, and devices of nearby
businesses. Devices of colleagues are only allowed to
collect information for certain reasons. The “Limited
collection, personal devices only” profile only allows the
collection of certain types of information by the user’s
own devices. The “No collection” profile does not allow
any data collection to take place by default.
Once the user chooses a profile, they will move to the
manual settings interface (Screens 2–4), where they can
further change some of their settings.
CONCLUSION
The motivation behind our research was the informa-
tion and choice overload associated with the plethora of
choices that users might face while setting their privacy
settings in an IoT environment. We have made use of
statistical analyses and machine learning algorithms to
provide a data-driven design for an IoT privacy-setting
interface. We summarize this procedure as follows:
Using statistical analysis, uncover the relative impor-
tance of the parameters that influence users’ privacy
decisions. Develop a “layered interface” in which de-
cision parameters are presented in decreasing order of
importance.
IoT Settings
Unknown devices
Government devices
My employer's devices
Devices of nearby businesses
Colleagues' devices
Friends' devices
My own devices
Which devices may collect your personal information?
more
more
more
more
more
more
more
9:00 AM 100%

Friends’ devices
identity
(other)
presence
mood
gender
age
What type of data may you r friends’ devices collect?
more
more
more
more
more
more
9:00 AM 100%

Voice, to determine my…
identity
gender
age more
more
more
Photos, to determine my…
Settings Voice - a ge
never
once
continuously
For what purpose may your friends’ devices record your voice
to determine your age?
9:00 AM 100%

Safety
Friends
never
once
continuously
Health
never
once
Convenience
Profiles
Default profiles
Please select a profile
(you can change individual settings on the next screen)
9:00 AM 100%

Limited collection
This profile allows the collection of:
any data by the your own devices, your friends’ devices,
your employer/school’s devices, and devices of nearby
businesses
any data by your colleagues’ devices, but only for certain
reasons
learn more…
No collection
This profile prevents the collection of any data
learn more…
next
Limited collection, personal devices only
This profile allows the collection of:
certain types of data by the your own devices
learn more…
Figure 6: From Left, Screen 1 shows three default settings, Screen 2,3 and 4 shows layered interface
Using a tree-learing algorithm, create a decision tree
that best predicts participants’ choices based on the
parameters. Use this tree to create a “smart default”
setting.
Using a combination of clustering and tree-learning
algorithms, create a set of Ndecision trees that best
predict participants’ choices. Use the trees to create
N“smart profiles”.
Develop a prototype for an IoT privacy-setting in-
terface that integrates the layered interface with the
smart default or the smart profiles.
We demonstrated this procedure by applying it to a
dataset collected by Lee and Kobsa [16]. In the process,
we made a number of interesting observations.
The statistical and machine learning results both indi-
cated that recipient of the information (who) is the most
significant parameter in users’ decision to allow or reject
IoT-based information collection. This parameter there-
fore features at the forefront in our layered settings inter-
face, and plays an important role in our smart profiles.
The what parameter was the second-most important de-
cision parameter, and interacted significantly with the
who parameter. This parameter therefore features at
the second level of our settings interface, and further
qualifies some of the settings in our smart profiles.
Our layered interface allows a further drill-down to the
reason and persistence paramters, but given the rela-
tively lesser importance of these parameters, we expect
few users to engage with the interface at this level. More-
over, the where parameter was not significant, so we left
it out of the interface.
While a naive (‘no’ to all) default setting in our interface
would have provided an accuracy of 71.67%, it would not
have allowed users who do not change the default setting
to reap the potential benefits associated with IoT data
collection. Our Overall Prediction procedure resulted in
a smart default setting that was a bit more permissive,
and increased the accuracy by 2%.
Our fit-based clustering approach, which iteratively clus-
ters users and fits an optimal tree in each cluster, pro-
vided the best solution. This resulted in an interface
where users can choose from 3 profiles, which increases
the accuracy by another 11.5%.
In sum, our analysis allowed us to develop an IoT
privacy-setting interface that may serve as groundwork
for future research. The goal of this paper was to use
data-driven design to bootstrap the development of a
privacy-setting interface, but a future user experiment
could investigate whether users are comfortable with
the layered interface, and whether they prefer a single
“smart default” setting or a choice among “smart pro-
files”.
Future work could also apply the proposed procedure to
other privacy-setting domains. In using scenarios, the
procedure avoids typical decision externalities such as
default effects, framing effects, and decision-context ef-
fects that tend to obfuscate users’ behaviors in more nat-
uralistic studies. Moreover, the scenarios can inform the
creation of privacy-setting interfaces for novel or cur-
rently non-existent technologies. As such we imagine
that the procedure could be applied in new domains,
such as household IoT (“smart home”) privacy, drone
privacy, and nano-tech privacy. In some of these do-
mains, fully “adaptive” privacy mechanisms that use
“active tracking” (cf. [13, 18]) are more suitable, while
other domains could benefit from our static, profile-
based approach.
REFERENCES
1. Acquisti, A., and Gross, R. Imagined
communities: Awareness, information sharing, and
privacy on the facebook. In International workshop
on privacy enhancing technologies (2006), Springer,
pp. 36–58.
2. Ajzen, I., and Fishbein, M. Attitude-behavior
relations: A theoretical analysis and review of
empirical research. Psychological bulletin 84, 5
(1977).
3. Baron, R. M., and Kenny, D. A. The
moderator–mediator variable distinction in social
psychological research: Conceptual, strategic, and
statistical considerations. Journal of personality
and social psychology 51, 6 (1986), 1173.
4. Boyles, J. L., Smith, A., and Madden, M.
Privacy and Data Management on Mobile Devices.
Tech. rep., Pew Internet & American Life Project,
2012.
5. Chow, R., Egelman, S., Kannavara, R., Lee,
H., Misra, S., and Wang, E. HCI in Business:
A Collaboration with Academia in IoT Privacy. In
HCI in Business, F. F.-H. Nah and C.-H. Tan,
Eds., no. 9191 in Lecture Notes in Computer
Science. Springer International Publishing, 2015.
6. Dong, C., Jin, H., and Knijnenburg, B. P.
Ppm: A privacy prediction model for online social
networks. In International Conference on Social
Informatics (2016), Springer, pp. 400–420.
7. Fang, L., and LeFevre, K. Privacy wizards for
social networking sites. In Proceedings of the 19th
international conference on World wide web (2010),
ACM, pp. 351–360.
8. Good, N., Dhamija, R., Grossklags, J.,
Thaw, D., Aronowitz, S., Mulligan, D., and
Konstan, J. Stopping Spyware at the Gate: A
User Study of Privacy, Notice and Spyware. In
Proceedings of the 2005 Symposium on Usable
Privacy and Security (2005), ACM, pp. 43–52.
9. Gross, R., and Acquisti, A. Information
revelation and privacy in online social networks. In
Proceedings of the 2005 ACM workshop on Privacy
in the electronic society (2005), ACM, pp. 71–80.
10. Gubbi, J., Buyya, R., Marusic, S., and
Palaniswami, M. Internet of things (iot): A
vision, architectural elements, and future
directions. Future generation computer systems 29,
7 (2013), 1645–1660.
11. Hall, M., Frank, E., Holmes, G.,
Pfahringer, B., Reutemann, P., and Witten,
I. H. The weka data mining software: an update.
ACM SIGKDD explorations newsletter 11, 1
(2009), 10–18.
12. Jensen, C., and Potts, C. Privacy Policies as
Decision-Making Tools: An Evaluation of Online
Privacy Notices. In 2004 Conference on Human
Factors in Computing Systems (2004), pp. 471–478.
13. Knijnenburg, B. P. A user-tailored approach to
privacy decision support. Ph.D. Thesis, University
of California, Irvine, Irvine, CA, 2015.
14. Knijnenburg, B. P., Kobsa, A., and Jin, H.
Dimensionality of information disclosure behavior.
International Journal of Human-Computer Studies
71, 12 (2013), 1144–1162.
15. Lederer, S., Mankoff, J., and Dey, A. K.
Who wants to know what when? privacy preference
determinants in ubiquitous computing. In CHI’03
extended abstracts on Human factors in computing
systems (2003), ACM, pp. 724–725.
16. Lee, H., and Kobsa, A. Understanding user
privacy in internet of things environments. Internet
of Things (WF-IoT) (2016).
17. Li, Y., Kobsa, A., Knijnenburg, B. P., and
Nguyen, M. C. Cross-cultural privacy prediction.
Proceedings on Privacy Enhancing Technologies 2
(2017), 93–112.
18. Liu, B., Andersen, M. S., Schaub, F.,
Almuhimedi, H., Zhang, S. A., Sadeh, N.,
Agarwal, Y., and Acquisti, A. Follow My
Recommendations: A Personalized Privacy
Assistant for Mobile App Permissions. In
Proceedings of the 2016 Symposium on Usable
Privacy and Security (2016).
19. Madejski, M., Johnson, M., and Bellovin,
S. M. A study of privacy settings errors in an
online social network. In IEEE International
Conference on Pervasive Computing and
Communications Workshops (2012), IEEE,
pp. 340–345.
20. Olson, J. S., Grudin, J., and Horvitz, E. A
study of preferences for sharing and privacy. In
CHI’05 extended abstracts on Human factors in
computing systems (2005), ACM, pp. 1985–1988.
21. Pallapa, G., Das, S. K., Di Francesco, M.,
and Aura, T. Adaptive and context-aware
privacy preservation exploiting user interactions in
smart environments. Pervasive and Mobile
Computing 12 (2014), 232–243.
22. Raber, F., Luca, A. D., and Graus, M.
Privacy wedges: Area-based audience selection for
social network posts. In Proceedings of the 2016
Symposium on Usable Privacy and Security (2016).
23. Ravichandran, R., Benisch, M., Kelley,
P. G., and Sadeh, N. M. Capturing social
networking privacy preferences. In Proceedings of
the 2009 Symposium on Usable Privacy and
Security (2009), Springer, pp. 1–18.
24. Sadeh, N., Hong, J., Cranor, L., Fette, I.,
Kelley, P., Prabaker, M., and Rao, J.
Understanding and capturing people’s privacy
policies in a mobile social networking application.
Personal and Ubiquitous Computing 13, 6 (2009),
401–412.
25. Sandhu, R. S., and Samarati, P. Access
control: principle and practice. IEEE
Communications Magazine 32, 9 (1994), 40–48.
26. Smith, N. C., Goldstein, D. G., and Johnson,
E. J. Choice Without Awareness: Ethical and
Policy Implications of Defaults. Journal of Public
Policy & Marketing 32, 2 (2013), 159–172.
27. Watson, J., Besmer, A., and Lipford, H. R.
+Your circles: sharing behavior on Google+. In
Proceedings of the 8th Symposium on Usable
Privacy and Security (2012), ACM, pp. 12:1–12:10.
28. Williams, M., Nurse, J. R., and Creese, S.
The perfect storm: The privacy paradox and the
internet-of-things. In Availability, Reliability and
Security (ARES), 2016 11th International
Conference on (2016), IEEE, pp. 644–652.
29. Wisniewski, P. J., Knijnenburg, B. P., and
Lipford, H. R. Making privacy personal:
Profiling social network users to inform privacy
education and nudging. International Journal of
Human-Computer Studies 98 (2017), 95–108.
... In doing so, we shed light on the global phenomenon of identity-based harassment targeting women, LGBTQIA+, and racial/ethnic minority user groups in social VR. Second, the large-scale and comprehensive nature of our study enables us to provide several previously un-or underexplored understandings, including: (1) unpacking the relationship between identity-based harassment in social VR and (mis)perceptions of selective identity revelation practices; (2) new evidence of how embodying one's offline identity in social VR may be less risky than previously thought by researchers and users alike; and (3) despite this more positive picture, different social VR user groups indeed experience significantly different frequencies of harassment for certain identity characteristics, namely by user gender identity, sexuality, and race/ethnicity. Third, we propose two primary implications for investigating and designing safer social VR spaces: re-orientating social VR research and development towards previously underexplored aspects of identity-based harassment and its connection to (mis)perceived identity revelation , and re-approaching how the interplay between the two should be communicated to social VR users. ...
... Our survey design utilized a form of repeated measures that produced missing data (i.e., not all participants indicated presenting or disclosing or encountering misperceptions about all identity characteristic types). Therefore, our analysis leveraged linear mixed effects (LME) modeling because it is particularly appropriate for repeated measures on the same participant and retains robustness with missing data [1]. All LMEs conducted for this study first began by running a model for the dependent variable (e.g., frequency of harassment) with a random intercept to account for repeated measures. ...
Article
Full-text available
The popularity of social virtual reality (VR) platforms such as VRChat has led to growing concerns about new and more severe forms of online harassment targeting one's identity characteristics (i.e., identity-based harassment ). Social VR users with marginalized identities (e.g., women, LGBTQIA+ individuals, and racial/ethnic minorities) have been reported as particularly vulnerable to such harassment. This is mainly because social VR can make one's offline identity known to others (i.e., what we term identity revelation in this work) through a unique combination of avatar design, voice use, and immersive full- or partial-body tracking. To address these safety concerns, there is an urgent need to unpack the complex dynamics surrounding how one's offline identity is (mis)perceived by others, and how these (mis)perceptions may affect identity-based harassment in social VR. This study thus utilizes a large-scale survey with 223 social VR users across six continents/regions of the world with varying social VR experiences and identities to investigate (1) the relationship between identity-based harassment in social VR and (mis)perceptions of selective identity revelation practices, (2) how embodying one's identity in social VR might actually be less risky than once thought, and (3) how who you are does still matter when it comes to identity-based harassment in social VR. It also highlights the need to better account for understudied aspects of identity-based harassment in social VR and to better educate social VR users on the interplay between harassment and (mis)perceived identity revelation in these spaces.
... However, such deciders could be explicitly addressed to improve their result quality, e.g., by applying a custom nudge (like N13: Empathy Instigation) or requesting another person to check their decisions. ➤ Deciders have the last say (C4): We re-grouped the user study decisions to simulate reviews with only correct and only incorrect N06: Choice Defaults (compare smart defaults [4,5]). In reality, every decider had to make 160 decisions, of which 80 were T P (should be accepted) and 80 were FP (should be removed). ...
... While these revokes cause some false rejects, false accepts would be worse as they create a false sense of security by legitimating excessive authorizations. For future work, we invite researchers to study the ARP, to investigate other digital nudges of Table 2 or their combinations, or to replicate this study with a larger sample size or smart defaults [4,5]. In sum, digital nudges are a promising tool to improve access reviews but need careful application. ...
Conference Paper
Full-text available
Organizations tend to over-authorize their members, ensuring smooth operations. However, these excessive authorizations offer a substantial attack surface and are the reason regulative authorities demand periodic checks of their authorizations. Thus, organizations conduct time-consuming and costly access reviews to verify these authorizations by human decision-makers. Still, these deciders only marginally revoke authorizations due to the poor usability of access reviews. In this work, we apply digital nudges to guide human deciders during access reviews to tackle this issue and improve security. In detail, we formalize the access review problem, interview experts (n = 10) to identify several nudges helpful for access reviews, and conduct a user study (n = 102) for the Choice Defaults Nudge. We show significant behavior changes in revoking authorizations. We also achieve time savings and less stress. However, we also found that improving the overall quality requires more advanced means. Finally, we discuss design implications for access reviews with digital nudges.
... However, there is no discussion about detecting other environments automatically, and the multi-environment concept is kept inside the same home. Bahirat et al. 33 present a data-driven approach to improve privacy. Using machine learning techniques, default smart profiles are generated and recommended to other users. ...
Article
The Internet of Things enhances the quality of life by automating tasks and streamlining human-device interactions. However, manual device management remains time-consuming, especially in multiple or new environments that demand new settings and interactions. Learning systems aid in automating task management, but their learning times hinder personalization and struggle when the system has to interact with multiple IoT environments, impacting user experience. This paper aims to optimize knowledge sharing for IoT environments, proposing a framework that utilizes recommender systems to find optimal and reusable configurations among IoT environments and users. To that end, this work leverages teacher-student relationships in Knowledge Distillation, facilitating knowledge sharing and enhancing knowledge reuse in learning models. In addition, real-time processing eliminates training time. This approach achieves a remarkable 93.15% accuracy.
... • Privacy such as data harvesting and surveillance, are still present, making it necessary for metaverse platforms to prioritize privacy by implementing policies and technologies that limit data collection and sharing [64]. Additionally, data-driven consent mechanisms can address users' privacy concerns based on individual preferences [65]. ...
Article
Full-text available
The growing interest in the metaverse has led to an abundance of platforms, each with its own unique features and limitations. This paper’s objective is two-fold. First, we aim at providing an objective analysis of requirements that need to be fulfilled by metaverse platforms. We survey a broad set of criteria including interoperability, immersiveness, persistence, multimodal and social interaction, scalability, level of openness, configurability, market access, security, and blockchain integration, among others. Second, we review a wide range of existing metaverse platforms, and we critically evaluate their ability to meet the requirements listed. We identify their limitations, which must be addressed to establish fair, trustworthy, and interactive experiences within the metaverse ecosystem. Looking forward, we highlight the need for further research and development in areas such as decentralization, improved security and privacy measures, and the integration of emerging technologies like blockchain and AI, as essential building blocks for a resilient and secure metaverse.
... great deal of insight into the individual's personal life. Due to the advent of IoT and the proliferation of connected devices, people must find ways to preserve their privacy to ensure their safety in a digital age (Bahirat et al., 2018;Chong et al., n.d.). ...
Chapter
Full-text available
With the Internet of Things (IoT) growing steadily, a wide range of application fields are being offered. These include monitoring health, weather, smart homes, autonomous vehicles, and so on. The result is the incorporation of solutions in various commercial and residential areas and the eventual emergence of them as ubiquitous objects in everyday life. Due to such circumstances, cybersecurity would be essential to mitigate risks, such as data exposure, denial of service efforts, malicious system exploitation, etc. A large majority of entry-level IoT consumer devices lack adequate protection systems, which makes them susceptible to a wide range of malicious attacks. The chapter discusses IoT architectures in depth, along with an analysis of potential applications. A detailed and thorough analysis of challenges in the IoT domain is also provided, emphasizing flaws in current commercial IoT solutions and the importance of designing IoT solutions with security and privacy in mind.
Chapter
This chapter provides an overview of solutions for achieving usable privacy. First, the need for combining human-centred and privacy by design approaches is highlighted. Moreover, it is discussed how important challenges for usable privacy can be approached by available solutions. These include example approaches for considering culturally dependent privacy personas, developing usable Privacy-Enhancing Technology (PET) configuration tools through interdisciplinary efforts, raising users’ attention to privacy as a secondary goal via engaging them with the policy content, designing usable multi-layered privacy notices and usable privacy management via semi-automation, and achieving usable transparency through usable explanations of PETs and different forms of visualisation of data disclosures. Finally, we discuss how fundamental legal privacy requirements map to Human-Computer Interaction (HCI) requirements and HCI solutions, focusing on the solutions discussed in this chapter.
Chapter
This final chapter provides key takeaways reflecting on lessons learnt from usable privacy research. Moreover, it discusses current and future challenges related to usable privacy that are associated with AI technology advancements. Finally, conclusions are drawn to make the point that interdisciplinary privacy research is needed to adequately address human aspects.
Conference Paper
Full-text available
Modern smartphone platforms have millions of apps, many of which request permissions to access private data and resources, like user accounts or location. While these smartphone platforms provide varying degrees of control over these permissions, the sheer number of decisions that users are expected to manage has been shown to be unrealistically high. Prior research has shown that users are often unaware of, if not uncomfortable with, many of their permission settings. Prior work also suggests that it is theoretically possible to predict many of the privacy settings a user would want by asking the user a small number of questions. However, this approach has neither been operationalized nor evaluated with actual users before. We report on a field study (n=72) in which we implemented and evaluated a Personalized Privacy Assistant (PPA) with participants using their own Android devices. The results of our study are encouraging. We find that 78.7% of the recommendations made by the PPA were adopted by users. Following initial recommendations on permission settings, participants were motivated to further review and modify their settings with daily "pri-vacy nudges." Despite showing substantial engagement with these nudges, participants only changed 5.1% of the settings previously adopted based on the PPA's recommendations. The PPA and its recommendations were perceived as useful and usable. We discuss the implications of our results for mobile permission management and the design of personalized privacy assistant solutions.
Article
Full-text available
The influence of cultural background on people’s privacy decisions is widely recognized. However, a cross-cultural approach to predicting privacy decisions is still lacking. Our paper presents a first integrated cross-cultural privacy prediction model that merges cultural, demographic, attitudinal and contextual prediction. The model applies supervised machine learning to users’ decisions on the collection of their personal data, collected from a large-scale quantitative study in eight different countries. We find that adding culture-related predictors (i.e. country of residence, language, Hofstede’s cultural dimensions) to demographic, attitudinal and contextual predictors in the model can improve the prediction accuracy. Hofstede’s variables - particularly individualism and indulgence - outperform country and language. We further apply generalized linear mixed-effect regression to explore possible interactions between culture and other predictors. We find indeed that the impact of contextual and attitudinal predictors varies between different cultures. The implications of such models in developing privacy-enabling technologies are discussed.
Conference Paper
Full-text available
Privacy is a concept found throughout human history and opinion polls suggest that the public value this principle. However, while many individuals claim to care about privacy, they are often perceived to express behaviour to the contrary. This phenomenon is known as the Privacy Paradox and its existence has been validated through numerous psychological, economic and computer science studies. Several contributory factors have been suggested including user interface design, risk salience, social norms and default configurations. We posit that the further proliferation of the Internet-of-Things (IoT) will aggravate many of these factors, posing even greater risks to individuals' privacy. This paper explores the evolution of both the paradox and the IoT, discusses how privacy risk might alter over the coming years, and suggests further research required to address a reasonable balance. We believe both technological and socio-technical measures are necessary to ensure privacy is protected in a world of ubiquitous technology.
Article
Social Network Sites (SNSs) offer a plethora of privacy controls, but users rarely exploit all of these mechanisms, nor do they do so in the same manner. We demonstrate that SNS users instead adhere to one of a small set of distinct privacy management strategies that are partially related to their level of privacy feature awareness. Using advanced Factor Analysis methods on the self-reported privacy behaviors and feature awareness of 308 Facebook users, we extrapolate six distinct privacy management strategies, including: Privacy Maximizers, Selective Sharers, Privacy Balancers, Self-Censors, Time Savers/Consumers, and Privacy Minimalists and six classes of privacy proficiency based on feature awareness, ranging from Novices to Experts. We then cluster users on these dimensions to form six distinct behavioral profiles of privacy management strategies and six awareness profiles for privacy proficiency. We further analyze these privacy profiles to suggest opportunities for training and education, interface redesign, and new approaches for personalized privacy recommendations.
Conference Paper
Online Social Networks (OSNs) have come to play an increasingly important role in our social lives, and their inherent privacy problems have become a major concern for users. Can we assist consumers in their privacy decision-making practices, for example by predicting their preferences and giving them personalized advice? To this end, we introduce PPM: a Privacy Prediction Model, rooted in psychological principles, which can be used to give users personalized advice regarding their privacy decision-making practices. Using this model, we study psychological variables that are known to affect users' disclosure behavior: the trustworthiness of the requester/information audience, the sharing tendency of the receiver/information holder, the sensitivity of the requested/shared information, the appropriateness of the request/sharing activities, as well as several more traditional contextual factors.
Conference Paper
The Internet of Things (IoT) integrates communication capabilities into physical objects to create a ubiquitous and multi-modal network of information and computing resources. The promise and pervasiveness of IoT ecosystems has lured many companies, including Intel, to devote resources and engineers to participate in the future of IoT. This paper describes a joint effort from Intel and two collaborators from academia to address the problem of IoT privacy.