ChapterPDF Available

Robustness of Meta Matrix Factorization Against Strict Privacy Constraints

Authors:

Abstract and Figures

In this paper, we explore the reproducibility of MetaMF, a meta matrix factorization framework introduced by Lin et al. MetaMF employs meta learning for federated rating prediction to preserve users’ privacy. We reproduce the experiments of Lin et al. on five datasets, i.e., Douban, Hetrec-MovieLens, MovieLens 1M, Ciao, and Jester. Also, we study the impact of meta learning on the accuracy of MetaMF’s recommendations. Furthermore, in our work, we acknowledge that users may have different tolerances for revealing information about themselves. Hence, in a second strand of experiments, we investigate the robustness of MetaMF against strict privacy constraints. Our study illustrates that we can reproduce most of Lin et al.’s results. Plus, we provide strong evidence that meta learning is essential for MetaMF’s robustness against strict privacy constraints.
Content may be subject to copyright.
Robustness of Meta Matrix Factorization
Against Strict Privacy Constraints
Peter Muellner1(B
), Dominik Kowald1, and Elisabeth Lex2
1Know-Center GmbH, Graz, Austria
{pmuellner,dkowald}@know-center.at
2Graz University of Technology, Graz, Austria
elisabeth.lex@tugraz.at
Abstract. In this paper, we explore the reproducibility of MetaMF, a
meta matrix factorization framework introduced by Lin et al. MetaMF
employs meta learning for federated rating prediction to preserve users’
privacy. We reproduce the experiments of Lin et al. on five datasets,
i.e., Douban, Hetrec-MovieLens, MovieLens 1M, Ciao, and Jester. Also,
we study the impact of meta learning on the accuracy of MetaMF’s
recommendations. Furthermore, in our work, we acknowledge that users
may have different tolerances for revealing information about themselves.
Hence, in a second strand of experiments, we investigate the robustness
of MetaMF against strict privacy constraints. Our study illustrates that
we can reproduce most of Lin et al.’s results. Plus, we provide strong
evidence that meta learning is essential for MetaMF’s robustness against
strict privacy constraints.
Keywords: Recommender systems ·Privacy ·Meta learning ·
Federated learning ·Reproducibility ·Matrix factorization
1 Introduction
State-of-the-art recommender systems learn a user model from user and item
data and the user’s interactions with items to generate personalized recommen-
dations. In that process, however, users’ personal information may be exposed,
resulting in severe privacy threats. As a remedy, recent research makes use of
techniques like federated learning [2,4,6] or meta learning [7,20] to ensure pri-
vacy in recommender systems. In the federated learning paradigm, no data ever
leaves a user’s device, and as such, the leakage of their data by other parties is
prohibited. With meta learning, a model gains the ability to form its hypothesis
based on a minimal amount of data.
Similar to recent work [5,15], MetaMF by Lin et al. [16] combines federated
learning with meta learning to provide personalization and privacy. Besides,
MetaMF exploits collaborative information among users and distributes a private
rating prediction model to each user. Due to MetaMF’s recency and its clear
focus on increasing privacy for users via a novel framework, we are interested
c
Springer Nature Switzerland AG 2021
D. Hiemstra et al. (Eds.): ECIR 2021, LNCS 12657, pp. 107–119, 2021.
https://doi.org/10.1007/978-3-030-72240-1_8
108 P. Muellner et al.
in the reproducibility of Lin et al.’s research. Additionally, we aim to contribute
our own branch of research regarding privacy, i.e., MetaMF’s robustness against
strict privacy constraints. This is motivated by a statement of Lin et al. about one
critical limitation of MetaMF, i.e., its sensitivity to data scarcity that could arise
when users employ strict privacy constraints by withholding a certain amount of
their data. In this regard, every user has a certain privacy budget, i.e., a budget
of private data she is willing to share. Thus, in our paper at hand, the privacy
budget is considered a measure of how much data disclosure a user tolerates
and is defined as the fraction of rating data she is willing to share with others.
Thereby, employing small privacy budgets and thus, withholding data, serves as
a realization of strict privacy constraints.
Our work addresses MetaMF’s limitation against data scarcity and is struc-
tured in two parts. First, we conduct a study with the aim to reproduce the
results given in the original work by Lin et al. Concretely, we investigate two lead-
ing research questions, i.e., RQ1a: How does MetaMF perform on a broad body
of datasets? and RQ1b: What evidence does MetaMF provide for personaliza-
tion and collaboration? Second, we present a privacy-focused study, in which we
evaluate the impact of MetaMF’s meta learning component and test MetaMF’s
performance on users with different amounts of rating data. Here, we investigate
two more research questions, i.e., RQ2a: What is the role of meta learning in the
robustness of MetaMF against decreasing privacy budgets? and RQ2b: How do
limited privacy budgets affect users with different amounts of rating data? We
address RQ1a and RQ1b in Sect. 3by testing MetaMF’s predictive capabilities
on five different datasets, i.e., Douban, Hetrec-MovieLens, MovieLens 1M, Ciao,
and Jester. Here, we find that most results provided by Lin et al. can be repro-
duced. In Sect. 4, we elaborate on RQ2a and RQ2b by examining MetaMF in
the setting of decreasing privacy budgets. Here, we provide strong evidence of
the important role of meta learning in MetaMF’s robustness. Besides, we find
that users with large amounts of rating data are substantially disadvantaged by
decreasing privacy budgets compared to users with few rating data.
2 Methodology
In this section, we illustrate our methodology of addressing RQ1a and RQ1b,
i.e., the reproducibility of Lin et al. [16], and RQ2a and RQ2b, i.e., MetaMF’s
robustness against decreasing privacy budgets.
2.1 Approach
MetaMF. Lin et al. recently introduced a novel matrix factorization framework
in a federated environment leveraging meta learning. Their framework comprises
three steps. First, collaborative information among users is collected and sub-
sequently, utilized to construct a user’s collaborative vector. This collaborative
vector serves as basis of the second step. Here, in detail, the parameters of
Robustness of MetaMF Against Strict Privacy Constraints 109
a private rating prediction model are learned via meta learning. Plus, in par-
allel, personalized item embeddings, representing a user’s personal “opinion”
about the items, are computed. Finally, in the third step, the rating of an item
is predicted utilizing the previously learned rating prediction model and item
embeddings. We resort to MetaMF to address RQ1a,RQ1b,andRQ2b, i.e., the
reproducibility of results presented by Lin et al. and the influence of decreasing
privacy budgets on users with different amounts of rating data.
NoMetaMF. In our privacy-focused study, RQ2a addresses the role of meta
learning in MetaMF’s robustness against decreasing privacy budgets. Thus, we
conduct experiments with and without MetaMF’s meta learning component. For
the latter kind of experiments, we introduce NoMetaMF, a variant of MetaMF
with no meta learning. In MetaMF, a private rating prediction model is gen-
erated for each user by leveraging meta learning. The authors utilize a hyper-
network [11], i.e., a neural network, coined meta network, that generates the
parameters of another neural network. Based on the user’s collaborative vector
cu, the meta network generates the parameters of the rating prediction model,
i.e., weights Wu
land biases bu
lfor layer land user u. This is given by
h=ReLU(W
hcu+b
h) (1)
Wu
l=U
Wu
lh+b
Wu
l(2)
bu
l=U
bu
lh+b
bu
l(3)
where his the hidden state with the widely-used ReLU(x) = max(0,x)[8,12]
activation function, W
h,U
Wu
l,U
bu
lare the weights and b
h,b
Wu
l,b
bu
lare the
biases of the meta network. NoMetaMF excludes meta learning by disabling
backpropagation through the meta network in Eqs. 13. Thus, meta parameters
W
h,U
Wu
l,U
bu
l,b
h,b
Wu
l,b
bu
lwill not be learned in NoMetaMF. While back-
propagation is disabled in the meta network, parameters Wu
land bu
lare learned
over those non-meta parameters in NoMetaMF to obtain the collaborative vec-
tor. Hence, the parameters of the rating prediction models are still learned for
each user individually, but without meta learning.
Lin et al. also introduce a variant of MetaMF, called MetaMF-SM, which
should not be confused with NoMetaMF. In contrast to MetaMF, MetaMF-SM
does not generate a private rating prediction model for each user individually, but
instead utilizes a shared rating prediction model for all users. Our NoMetaMF
model generates an individual rating prediction model for each user but operates
without meta learning. Furthermore, we note that in our implementation of
NoMetaMF, the item embeddings are generated in the same way as in MetaMF.
With NoMetaMF, we aim to investigate the impact of meta learning on the
robustness of MetaMF against decreasing privacy budgets, i.e., RQ2a.
2.2 Datasets
In line with Lin et al., we conduct experiments on four datasets: Douban [14],
Hetrec-MovieLens [3], MovieLens 1M [13], and Ciao [10]. We observe that none
110 P. Muellner et al.
of these datasets comprises a high average number of ratings per item, i.e., 22.6
(Douban), 85.6 (Hetrec-MovieLens), 269.8 (MovieLens 1M), and 2.7 (Ciao). To
increase the diversity of our datasets, we include a fifth dataset to our study, i.e.,
Jester [9] with an average number of ratings per item of 41,363.6. Furthermore,
Lin et al. claimed that several observations about Ciao may be explained by its
low average number of ratings per user, i.e., 38.3. Since Jester exhibits a similarly
low average number of ratings per user, i.e., 56.3, we utilize Jester to verify Lin et
al.’s claims. To fit the rating scale of the other datasets, we scale Jester’s ratings
to a range of [1, 5]. Descriptive statistics of our five datasets are outlined in
detail in the following lines. Douban comprises 2,509 users with 893,575 ratings
for 39,576 items. Hetrec-MovieLens includes 10,109 items and 855,598 ratings of
2,113 users. The popular MovieLens 1M dataset includes 6,040 users, 3,706 items
and 1,000,209 ratings. Ciao represents 105,096 items, with 282,619 ratings from
7,373 users. Finally, our additional Jester dataset comprises 4,136,360 ratings
for 100 items from 73,421 users.
We follow the evaluation protocol of Lin et al. and thus, perform no cross-
validation. Therefore, each dataset is randomly separated into 80% training set
Rtrain, 10% validation set Rval and 10% test set Rtest. However, we highlight
that in the case of Douban, Hetrec-MovieLens, MovieLens 1M, and Ciao, we
utilize the training, validation and test set provided by Lin et al.
Identification of User Groups. In RQ2b, we study how decreasing privacy
budgets influence the recommendation accuracy of user groups with different
user behavior. That is motivated by recent research [1,19], which illustrates dif-
ferences in recommendation quality for user groups with different characteristics.
As an example, [19] measures a user group’s mainstreaminess, i.e., how the user
groups’ most listened artists match the most listened artists of the entire pop-
ulation. The authors split the population into three groups of users with low,
medium, and high mainstreaminess, respectively. Their results suggest that low
mainstream users receive far worse recommendations than mainstream users.
In a similar vein, we also split users into three user groups: Low,Med,and
High, referring to users with a low, medium, and a high number of ratings,
respectively. To precisely study the effects of decreasing privacy budgets on each
user group, we generate them such that the variance of the number of ratings
is low, but yet, include a sufficiently large number of users. For this matter,
each of our three user groups includes 5% of all users. In detail, we utilize the
5% of users with the least ratings (i.e., Low ), the 5% of users with the most
ratings (i.e., High) and the 5% of users, whose number of ratings are the closest
to the median (i.e., Med). Thus, each user group consists of 125 (Douban), 106
(Hetrec-MovieLens), 302 (MovieLens 1M), 369 (Ciao), and 3,671 (Jester) users.
2.3 Recommendation Evaluation
In concordance to the methodology of Lin et al., we minimize the mean squared
error (MSE) between the predicted ˆrˆ
Rand the real ratings rRas the
Robustness of MetaMF Against Strict Privacy Constraints 111
objective function for training the model. Additionally, we report the MSE and
the mean absolute error (MAE) on the test set Rtest to estimate our models’
predictive capabilities. Since we dedicate parts of this work to shed light on
MetaMF’s and NoMetaMF’s performance in settings with different degrees of
privacy, we illustrate how we simulate decreasing privacy budgets and how we
evaluate a model’s robustness against these privacy constraints.
Simulating Different Privacy Budgets. To simulate the reluctance of users
to share their data, we propose a simple sampling procedure in Algorithm 1.
Let βbe the privacy budget, i.e., the fraction of data to be shared. First, a user
urandomly selects a fraction of βof her ratings without replacement. Second,
the random selection of ratings Rβ
uis then shared by adding it to the set Rβ.
That ensures that (i) each user has the same privacy budget βand (ii) each user
shares at least one rating to receive recommendations. The set of shared ratings
Rβwithout held back ratings then serves as a training set for our models.
Algorithm 1: Sampling procedure for simulating privacy budget β.
Input: Ratings R,UsersUand privacy budget β.
Result: Shared ratings Rβ, with a fraction of βof each user’s ratings.
Rβ={}
for uUdo
Rβ
u={R
uRu:|R
u|/|Ru|=β}
Rβ=RβRβ
u
end
Measuring Robustness. Our privacy-focused study is concerned with dis-
cussing MetaMF’s robustness against decreasing privacy budgets. We quantify a
model’s robustness by how the model’s predictive capabilities change by decreas-
ing privacy budgets. In detail, we introduce a novel accuracy measurement called
ΔMAE@β, which is a simple variant of the mean absolute error.
Definition 1 (ΔMAE@β). The relative mean absolute error ΔMAE@βmea-
sures the predictive capabilities of a model Munder a privacy budget βrelative
to the predictive capabilities of Mwithout any privacy constraints.
MAE@β=1
|Rtest|
ru,iRtest
|(ru,i M(Rβ
train)u,i)|(4)
ΔMAE@β=MAE@β
MAE@1.0(5)
where M(Rβ
train)u,i is the estimated rating for user uon item ifor Mwith
parameters θbeing trained on the dataset Rβ
train and |·| is the absolute function.
Please note that the same Rtest is utilized for different values of β.
112 P. Muellner et al.
Table 1. MetaMF’s error measurements (reproduced/original) for our five datasets
alongside the MAE (mean absolute error) and the MSE (mean squared error) reported
in the original paper. The non-reproducibility of the MSE on the Ciao dataset can be
explained by the particularities of the MSE and the Ciao dataset. All other measure-
ments can be reproduced (RQ1a ).
Dataset MAE MSE
Douban 0.588/0.584 0.554/0.549
Hetrec-MovieLens 0.577/0.571 0.587/0.578
MovieLens 1M 0.687/0.687 0.765/0.760
Ciao 0.774/0.774 1.125/1.043
Jester 0.856/- 1.105/-
Furthermore, it is noteworthy that the magnitude of ΔMAE@βmeasure-
ments does not depend on the underlying dataset, as it is a relative measure.
Thus, one can compare a model’s ΔMAE@βmeasurements among different
datasets.
2.4 Source Code and Materials
For the reproducibility study, we utilize and extend the original implemen-
tation of MetaMF, which is provided by the authors alongside the Douban,
Hetrec-MovieLens, MovieLens 1M, and Ciao dataset samples via BitBucket1.
Furthermore, we publish the entire Python-based implementation of our work
on GitHub2and our three user groups for all five datasets on Zenodo3[18].
We want to highlight that we are not interested in outperforming any state-
of-the-art approaches on our five datasets. Thus, we refrain from conducting
any hyperparameter tuning or parameter search and utilize precisely the same
parameters, hyperparameters, and optimization algorithms as Lin et al. [16].
3 Reproducibility Study
In this section, we address RQ1a and RQ1b. As such, we repeat experiments by
Lin et al. [16] to verify the reproducibility of their results. Therefore, we evaluate
MetaMF on the four datasets Douban, Hetrec-MovieLens, MovieLens 1M, and
Ciao. Additionally, we measure its accuracy on the Jester dataset. Please note
that we strictly follow the evaluation procedure as in the work to be reproduced.
We provide MAE (mean absolute error) and MSE (mean squared error) mea-
surements on our five datasets in Table 1. It can be observed that we can repro-
duce the results by Lin et al. up to a margin of error smaller than 2%. Only in
1https://bitbucket.org/HeavenDog/metamf/src/master/, Last accessed Oct. 2020.
2https://github.com/pmuellner/RobustnessOfMetaMF.
3https://doi.org/10.5281/zenodo.4031011.
Robustness of MetaMF Against Strict Privacy Constraints 113
the case of the MSE on the Ciao dataset, we obtain different results. Due to the
selection of random batches during training, our model slightly deviates from
the one utilized by Lin et al. Thereby, also, the predictions are likely to differ
marginally. As described in [21], the MSE is much more sensitive to the variance
of the observations than the MAE. Thus, we argue that the non-reproducibility
of the MSE on the Ciao dataset can be explained by the sensitivity of the MSE on
the variance of the observations in each batch. In detail, we observed in Sect. 2.2
that Ciao comprises very few ratings but lots of items. Thus, the predicted rat-
ings are sensitive to the random selection of training data within each batch.
However, it is noteworthy that we can reproduce the more stable MAE on the
Ciao dataset. Hence, we conclude that our results provide strong evidence of
the originally reported measurements being reproducible, enabling us to answer
RQ1a in the affirmative.
Next, we study the rating prediction models’ weights and the learned item
embeddings. Again, we follow the procedure of Lin et al. and utilize the popular
t-SNE (t-distributed stochastic neighborhood embedding) [17] method to reduce
the dimensionality of the weights and the item embeddings to two dimensions.
Since Lin et al. did not report any parameter values for t-SNE, we rely on the
default parameters, i.e., we set the perplexity to 30 [17]. After the dimensionality
reduction, we standardize all observations xXby xμ
σ, where μis the mean
and σis the standard deviation of X. The rating prediction model of each user
is defined as a two-layer neural network. However, we observe that Lin et al.
did not describe what layer’s weights they visualize. Correspondences with the
leading author of Lin et al. clarified that in their work, they only describe the
weights of the first layer of the rating prediction models. The visualizations of
the first layer’s weights of the rating prediction models on our five datasets are
given in Fig. 1.
In line with Lin et al., we discuss the weights and the item embeddings with
respect to personalization and collaboration. As the authors suggest, personal-
ization leads to distinct weight embeddings and collaboration leads to clusters
within the embedding space. First, we observe that MetaMF tends to generate
different weight embeddings for each user. Second, the visualizations exhibits
well-defined clusters, which indicates that MetaMF can exploit collaborative
information among users. However, our visualizations of the weights deviate
slightly from the ones reported by Lin et al. Similar to the reproduction of the
accuracy measurements in Table 1, we attribute this to the inability to derive
the exact same model as Lin et al. Besides, t-SNE comprises random compo-
nents and thus, generates slightly varying visualizations. However, the weights
for the Ciao dataset in Fig. 1d illustrate behavior that contradicts Lin et al.’s
observations. In the case of the Ciao dataset, they did not observe any form of
clustering and attributed this behavior to the small number of ratings per user
in the Ciao dataset. To test their claim, we also illustrate the Jester dataset
with a similarly low number of ratings per user. In contrast, our visualizations
indeed show well-defined clusters and different embeddings. We note that Jester
exhibits many more clusters than the other datasets due to the much larger
114 P. Muellner et al.
(a) Douban (b) Hetrec-MovieLens (c) MovieLens 1M
(d) Ciao (e) Jester
Fig. 1. MetaMF’s weights embeddings of the first layer of the rating prediction models.
One observation corresponds to an individual user (RQ1b ).
number of users. Overall, we find that both, Ciao and Jester, do not support the
claim made by Lin et al. However, we see the possibility that this observation
may be caused by randomness during training.
Due to space limitations, we refrain from visualizing the item embeddings.
It is worth noticing that our observations on the weights also hold for the item
embeddings. In detail, our visualizations exhibit indications of collaboration and
personalization for all datasets. Overall, we find the visualizations of the weights
and the item embeddings presented by Lin et al. to be reproducible for the
Douban, Hetrec-MovieLens, and MovieLens 1M datasets and thus, we can also
positively answer RQ1b.
4 Privacy-Focused Study
In the following, we present experiments that go beyond reproducing Lin et al.’s
work [16]. Concretely, we explore the robustness of MetaMF against decreasing
privacy budgets and discuss RQ2a and RQ2b. More detailed, we shed light on
the effect of decreasing privacy budgets on MetaMF in two settings: (i) the role
of MetaMF’s meta learning component and (ii) MetaMF’s ability to serve users
with different amounts of rating data equally well.
First, we compare MetaMF to NoMetaMF in the setting of decreasing privacy
budgets. Therefore, we utilize our sampling procedure in Algorithm 1to generate
datasets with different privacy budgets. In detail, we construct 10 training sets,
i.e., {Rβ
train :β∈{1.0,0.9,...,0.2,0.1}}, on which MetaMF and NoMetaMF
are trained on. Then, we evaluate both models on the test set Rtest.Itisworth
noticing that Rtest is the same for all values of βto enable a valid compari-
son. Our results in Fig. 2a illustrate that for all datasets, MetaMF preserves its
Robustness of MetaMF Against Strict Privacy Constraints 115
(a) MetaMF (b) NoMetaMF
Fig. 2. ΔMAE@βmeasurements on (a) MetaMF and (b) NoMetaMF, in which meta
learning is disabled. Especially for small privacy budgets, MetaMF yields a much more
stable accuracy than NoMetaMF (RQ2a ).
predictive capabilities well, even with decreasing privacy budgets. However, a
privacy budget of 50% seems to be a critical threshold. The ΔMAE@βonly
marginally increases for β>0.5, but rapidly grows for β0.5 in the case of
the Douban, Hetrec-MovieLens, and MovieLens 1M dataset. In other words, a
user could afford to withhold 50% of her data and still get well-suited recom-
mendations. Additionally, the ΔMAE@βremains stable for the Ciao and Jester
dataset. Similar observations can be made about the results of NoMetaMF in
Fig. 2b. Again, the predictive capabilities remain stable for β>0.5 in the case of
Douban, Hetrec-MovieLens, and MovieLens 1M, but decrease tremendously for
higher levels of privacy. Our side-by-side comparison of MetaMF and NoMetaMF
in Fig. 2suggests that both methods exhibit robust behavior for large privacy
budgets (i.e., β>0.5), but exhibit an increasing MAE for less data available
(i.e., β0.5). However, we would like to highlight that the increase of the MAE
is much worse for NoMetaMF than for MetaMF. Here, the ΔMAE@βindicates
that the MAE for NoMetaMF increases much faster than the MAE for MetaMF
for decreasing privacy budgets. This observation pinpoints the importance of
meta learning and personalization in settings with a limited amount of data per
user, i.e., a high privacy level. Thus, concerning RQ2a, we conclude that MetaMF
is indeed more robust against decreasing privacy budgets than NoMetaMF, but
yet, requires a sufficient amount of data per user.
Next, we compare MetaMF to NoMetaMF with respect to their ability for
personalization and collaboration in the setting of decreasing privacy budgets.
As explained in Sect. 3, we refer to Lin et al., which suggest that personalization
leads to distinct weight embeddings and collaboration leads to clusters within the
embedding space. In Fig. 3, we illustrate the weights of the first layer of the rat-
ing prediction models of MetaMF and NoMetaMF for the MovieLens 1M dataset
for different privacy budgets (i.e., β∈{1.0,0.5,0.1}). Again, we applied t-SNE
to reduce the dimensionality to two dimensions, followed by standardization to
ease the visualization. In the case of MetaMF, we observe that it preserves the
116 P. Muellner et al.
(a) β=1.0(b)β=0.5 (c) β=0.1
(d) β=1.0 (e) β=0.5(f)β=0.1
Fig. 3. Weights of the first layer of the rating prediction models for the MovieLens 1M
dataset. (a), (b), (c) depict MetaMF, whereas (d), (e), (f) depict NoMetaMF, in which
meta learning is disabled. No well-defined clusters are visible for NoMetaMF, which
indicates the inability to exploit collaborative information among users (RQ2a ).
ability to generate different weights for each user for decreasing privacy budgets.
Similarly, well-defined clusters can be seen, which indicates that MetaMF also
preserves the ability to capture collaborative information among users. In con-
trast, our visualizations for NoMetaMF do not show well-defined clusters. This
indicates that NoMetaMF loses the ability to exploit collaborative information
among users. Due to limited space, we refrain from presenting the weights of the
first layer of the rating prediction models for the other datasets. However, we
observe that MetaMF outperforms NoMetaMF in preserving the collaboration
ability for decreasing privacy budgets on the remaining four datasets, which is
also in line with our previous results regarding RQ2a.
In the following, we elaborate on how the high degree of personalization in
MetaMF impacts the recommendations of groups of users with different amounts
of rating data. In a preliminary experiment, we measure the MAE on our three
user groups Low,Med,andHigh on our five datasets in Table 2. Except for the
Ciao dataset, our results provide evidence that Low is served with significantly
worse recommendations than High. In other words, users with lots of ratings are
advantaged over users with only a few ratings.
To detail the impact of decreasing privacy budgets on these user groups,
we monitor the ΔMAE@βon Low,Med,andHigh. The results for our five
datasets are presented in Fig. 4. Surprisingly, Low seems to be much more robust
against small privacy budgets than High. Here, we refer to our observations about
MetaMF’s performance on the Ciao and Jester dataset in Fig. 2a. In contrast
to the other datasets, Ciao and Jester comprise only a small average number of
ratings per user, i.e., 38 (Ciao) and 56 (Jester), which means that they share
a common property with our Low user group. Thus, we suspect a relationship
Robustness of MetaMF Against Strict Privacy Constraints 117
Table 2. MetaMF’s MAE (mean absolute error) measurements for our three user
groups on the five datasets. Here, we simulated a privacy budget of β=1.0. According
to a one-tailed t-Test, Low is significantly disadvantaged over High, indicated by *, i.e.,
α=0.05 and ****, i.e., α=0.0001 (RQ2b).
Dataset Low Med High
Douban* 0.638 0.582 0.571
Hetrec-MovieLens**** 0.790 0.603 0.581
MovieLens 1M**** 0.770 0.706 0.673
Ciao 0.773 0.771 0.766
Jester**** 1.135 0.855 0.811
between the robustness against decreasing privacy budgets and the amount of
rating data per user. The most prominent examples of Low being more robust
than High can be found in Figs. 4a, 4band4c. Here, the accuracy of MetaMF
on High substantially decreases for small privacy budgets. On the one hand,
MetaMF provides strongly personalized recommendations for users with lots of
ratings, which results in a high accuracy for these users (i.e., High). On the
other hand, this personalization leads to a serious reliance on the data, which
has a negative impact on the performance in settings with small privacy budgets.
Thus, concerning RQ2b, we conclude that users with lots of ratings receive better
recommendations than other users if they can take advantage of their abundance
(a) Douban (b) Hetrec-MovieLens (c) MovieLens 1M
(d) Ciao (e) Jester
Fig. 4. MetaMF’s ΔMAE@βmeasurements for the (a) Douban, (b) Hetrec-MovieLens,
(c) MovieLens 1M, (d) Ciao, and (e) Jester dataset for all three usergroups. Especially
(a), (b), and (c) illustrate that High is sensitive to small privacy budgets. In contrast,
Low can afford a high degree of privacy, since the accuracy of its recommendations
only marginally decreases (RQ2b ).
118 P. Muellner et al.
of data. In settings where a high level of privacy is required, i.e., a low privacy
budget, and thus, users decide to hold back the majority of their data, users are
advantaged who do not require as much personalization from the recommender
system.
5 Conclusions and Future Work
In our study at hand, we conducted two lines of research. First, we reproduced
results presented by Lin et al. in [16]. Besides, we introduced a fifth dataset,
i.e., Jester, which, in contrast to the originally utilized datasets, has plenty
of rating data per item. We found that all accuracy measurements are indeed
reproducible (RQ1a). However, our reproduction of the t-SNE visualizations of
the embeddings illustrated potential discrepancies between our and Lin et al.’s
work (RQ1b ). Second, we conducted privacy-focused studies. Here, we thor-
oughly investigated the meta learning component of MetaMF. We found that
meta learning takes an important role in preserving the accuracy of the recom-
mendations for decreasing privacy budgets (RQ2a). Furthermore, we evaluated
MetaMF’s performance with respect to decreasing privacy budgets on three user
groups that differ in their amounts of rating data. Surprisingly, the accuracy of
the recommendations for users with lots of ratings seems far more sensitive to
small privacy budgets than for users with a limited amount of data (RQ2b).
Future Work. In our future work, we will research how to cope with incomplete
user profiles in our datasets, as users may already have limited the amount
of their rating data to satisfy their privacy constraints. Furthermore, we will
develop methods that identify the ratings a user should share based on the
characteristics of the data.
Acknowledgements. We thank the Social Computing team for their rich feedback
on this work. This work is supported by the H2020 project TRUSTS (GA: 871481) and
the “DDAI” COMET Module within the COMET – Competence Centers for Excel-
lent Technologies Programme, funded by the Austrian Federal Ministry for Transport,
Innovation and Technology (bmvit), the Austrian Federal Ministry for Digital and Eco-
nomic Affairs (bmdw), the Austrian Research Promotion Agency (FFG), the province
of Styria (SFG) and partners from industry and academia. The COMET Programme
is managed by FFG.
References
1. Abdollahpouri, H., Mansoury, M., Burke, R., Mobasher, B.: The unfairness of
popularity bias in recommendation. In: Workshop on Recommendation in Multi-
stakeholder Environments in Conjunction with RecSys 2019 (2019)
2. Ammad-Ud-Din, M., et al.: Federated collaborative filtering for privacy-preserving
personalized recommendation system. arXiv preprint arXiv:1901.09888 (2019)
3. Cantador, I., Brusilovsky, P., Kuflik, T.: Second international workshop on infor-
mation heterogeneity and fusion in recommender systems. In: RecSys 2011 (2011)
Robustness of MetaMF Against Strict Privacy Constraints 119
4. Chen, C., Zhang, J., Tung, A.K., Kankanhalli, M., Chen, G.: Robust federated
recommendation system. arXiv preprint arXiv:2006.08259 (2020)
5. Chen, F., Luo, M., Dong, Z., Li, Z., He, X.: Federated meta-learning with fast
convergence and efficient communication. arXiv preprint arXiv:1802.07876 (2018)
6. Duriakova, E., et al.: PDMFRec: a decentralised matrix factorisation with tunable
user-centric privacy. In: RecSys 2019 (2019)
7. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation
of deep networks. In: ICML 2017 (2017)
8. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AIS-
TATS 2011 (2011)
9. Goldberg, K., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: a constant time
collaborative filtering algorithm. Inf. Retrieval 4(2), 133–151 (2001)
10. Guo, G., Zhang, J., Thalmann, D., Yorke-Smith, N.: ETAF: an extended trust
antecedents framework for trust prediction. In: ASONAM 2014 (2014)
11. Ha, D., Dai, A., Le, Q.V.: Hypernetworks. In: ICLR 2016 (2016)
12. Hahnloser, R.H., Sarpeshkar, R., Mahowald, M.A., Douglas, R.J., Seung, H.S.:
Digital selection and analogue amplification coexist in a cortex-inspired silicon
circuit. Nature 405(6789), 947–951 (2000)
13. Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM
Trans. Interact. Intell. Syst. (TIIS) 5(4), 1–19 (2015)
14. Hu, L., Sun, A., Liu, Y.: Your neighbors affect your ratings: on geographical neigh-
borhood influence to rating prediction. In: SIGIR 2014 (2014)
15. Jiang, Y., Koneˇcn`y, J., Rush, K., Kannan, S.: Improving federated learning per-
sonalization via model agnostic meta learning. In: International Workshop on Fed-
erated Learning for User Privacy and Data Confidentiality in conjunction with
NeurIPS 2019 (2019)
16. Lin, Y., et al.: Meta matrix factorization for federated rating predictions. In: SIGIR
2020 (2020)
17. Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res.
9(Nov), 2579–2605 (2008)
18. M¨ullner, P., Kowald, D., Lex, E.: User Groups for Robustness of Meta Matrix Fac-
torization Against Decreasing Privacy Budgets (2020). https://doi.org/10.5281/
zenodo.4031011
19. Schedl, M., Bauer, C.: Distance-and rank-based music mainstreaminess measure-
ment. In: UMAP 2017 (2017)
20. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In:
NIPS 2017 (2017)
21. Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over
the root mean square error (RMSE) in assessing average model performance. Cli-
mate Res. 30(1), 79–82 (2005)
... Specifically, instead of a user's rating data, the parameters of the user's local recommendation model are utilized in the recommendation process. Since federated learning could still leak user data [44,48], research proposes to learn a user's local model by utilizing only a subset of the rating data [4,15,43]. Moreover, differential privacy (DP) [18,20] has been leveraged for collaborative filtering recommender systems [12,13,24,56,61]. ...
Article
Full-text available
User-based KNN recommender systems ( UserKNN ) utilize the rating data of a target user’s k nearest neighbors in the recommendation process. This, however, increases the privacy risk of the neighbors, since the recommendations could expose the neighbors’ rating data to other users or malicious parties. To reduce this risk, existing work applies differential privacy by adding randomness to the neighbors’ ratings, which unfortunately reduces the accuracy of UserKNN . In this work, we introduce ReuseKNN , a novel differentially-private KNN-based recommender system. The main idea is to identify small but highly reusable neighborhoods so that (i) only a minimal set of users requires protection with differential privacy, and (ii) most users do not need to be protected with differential privacy, since they are only rarely exploited as neighbors. In our experiments on five diverse datasets, we make two key observations: Firstly, ReuseKNN requires significantly smaller neighborhoods, and thus, fewer neighbors need to be protected with differential privacy compared to traditional UserKNN . Secondly, despite the small neighborhoods, ReuseKNN outperforms UserKNN and a fully differentially private approach in terms of accuracy. Overall, ReuseKNN leads to significantly less privacy risk for users than in the case of UserKNN .
... Our Work. In our work [21], we experiment with MetaMF [18] and test its robustness against small privacy budgets. We measure robustness via the relative accuracy loss ∆M AE, and privacy budget is the fraction β of each user's data that the recommender system is allowed to use. ...
Chapter
Full-text available
Recommender systems process abundances of user data to generate recommendations that fit well to each individual user. This utilization of user data can pose severe threats to user privacy, e.g., the inadvertent leakage of user data to untrusted parties or other users. Moreover, this data can be used to reveal a user’s identity, or to infer very private information as, e.g., gender. Instead of the plain application of privacy-enhancing techniques, which could lead to decreased accuracy, we tackle the problem itself, i.e., the utilization of user data. With this, we aim to equip recommender systems with means to provide high-quality recommendations that respect users’ privacy.
... Our Work. In our work [20], we experiment with MetaMF [17] and test its robustness against small privacy budgets. We measure robustness via the relative accuracy loss ∆M AE, and privacy budget is the fraction β of each user's data that the recommender system is allowed to use. ...
Preprint
Full-text available
Recommender systems process abundances of user data to generate recommendations that fit well to each individual user. This utilization of user data can pose severe threats to user privacy, e.g., the inadvertent leakage of user data to untrusted parties or other users. Moreover, this data can be used to reveal a user's identity, or to infer very private information as, e.g., gender. Instead of the plain application of privacy-enhancing techniques, which could lead to decreased accuracy, we tackle the problem itself, i.e., the utilization of user data. With this, we aim to equip recommender systems with means to provide high-quality recommendations that respect users' privacy.
... For example, Perifanis and Efraimidis [48] combine federated learning with neural collaborative filtering [29] to improve privacy. However, since federated learning could still leak user data [47,50], research proposes to learn a user's local model by utilizing only a subset of the rating data [4,14,46]. Moreover, differential privacy (DP) [18,20] has been leveraged for collaborative filtering recommender systems [10-12, 24, 60, 65]. ...
Preprint
Full-text available
User-based KNN recommender systems (UserKNN ) utilize the rating data of a target user's k nearest neighbors in the recommendation process. This, however, increases the privacy risk of the neighbors since their rating data might be exposed to other users or malicious parties. To reduce this risk, existing work applies differential privacy by adding randomness to the neighbors' ratings, which reduces the accuracy of UserKNN. In this work, we introduce ReuseKNN, a novel differentially-private KNN-based recommender system. The main idea is to identify small but highly reusable neighborhoods so that (i) only a minimal set of users requires protection with differential privacy, and (ii) most users do not need to be protected with differential privacy, since they are only rarely exploited as neighbors. In our experiments on five diverse datasets, we make two key observations: Firstly, ReuseKNN requires significantly smaller neighborhoods, and thus, fewer neighbors need to be protected with differential privacy compared to traditional UserKNN. Secondly, despite the small neighborhoods, ReuseKNN outperforms UserKNN and a fully differentially private approach in terms of accuracy. Overall, ReuseKNN leads to significantly less privacy risk for users than in the case of UserKNN
Chapter
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction.KeywordsGeographic information extractionDocument geolocationGeoparsingToponym recognitionToponym resolution
Chapter
Over these past five years, significant breakthroughs, led by Transformers and large language models, have been made in understanding natural language text. However, the ability to capture contextual nuances in longer texts is still an elusive goal, let alone the understanding of consistent fine-grained narrative structures in text. These unsolved challenges and the interest in the community are at the basis of the sixth edition of Text2Story workshop to be held in Dublin on April 2nd, 2023 in conjunction with the 45th European Conference on Information Retrieval (ECIR’23). In its sixth edition, we aim to bring to the forefront the challenges involved in understanding the structure of narratives and in incorporating their representation in well-established models, as well as in modern architectures (e.g., transformers) which are now common and form the backbone of almost every IR and NLP application. It is hoped that the workshop will provide a common forum to consolidate the multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the narrative extraction and generation task. Text2Story includes sessions devoted to full research papers, work-in-progress, demos and dissemination papers, keynote talks and space for an informal discussion of the methods, of the challenges and of the future of this research area.
Article
Machine learning methods exploit similarities in users’ activity patterns to provide recommendations in applications across a wide range of fields including entertainment, dating, and commerce. However, in domains that demand protection of personally sensitive data, such as medicine or banking, how can we learn recommendation models without accessing the sensitive data and without inadvertently leaking private information? Many situations in the medical field prohibit centralizing the data from different hospitals and thus require learning from information kept in separate databases. We propose a new federated approach to learning global and local private models for recommendation without collecting raw data, user statistics, or information about personal preferences. Our method produces a set of locally learned prototypes that allow us to infer global behavioral patterns while providing differential privacy guarantees for users in any database of the system. By requiring only two rounds of communication, we both reduce the communication costs and avoid excessive privacy loss associated with typical federated learning iterative procedures. We test our framework on synthetic data, real federated medical data, and a federated version of Movielens ratings. We show that local adaptation of the global model allows the proposed method to outperform centralized matrix-factorization-based recommender system models, both in terms of the accuracy of matrix reconstruction and in terms of the relevance of recommendations, while maintaining provable privacy guarantees. We also show that our method is more robust and has smaller variance than individual models learned by independent entities.
Conference Paper
Full-text available
Recommender systems are known to suffer from the popularity bias problem: popular (i.e. frequently rated) items get a lot of exposure while less popular ones are under-represented in the recommendations. Research in this area has been mainly focusing on finding ways to tackle this issue by increasing the number of recommended long-tail items or otherwise the overall catalog coverage. In this paper, however, we look at this problem from the users' perspective: we want to see how popularity bias causes the recommendations to deviate from what the user expects to get from the recommender system. We define three different groups of users according to their interest in popular items (Niche, Diverse and Blockbuster-focused) and show the impact of popularity bias on the users in each group. Our experimental results on a movie dataset show that in many recommendation algorithms the recommendations the users get are extremely concentrated on popular items even if a user is interested in long-tail and non-popular items showing an extreme bias disparity.
Conference Paper
Full-text available
A music listener's mainstreaminess indicates the extent to which her listening preferences correspond to those of the population at large. However, formal definitions to quantify the level of mainstreaminess of a listener are rare and those available define mainstreaminess based on fractions between some kind of individual and global listening profiles. We argue, in contrast, that measures based on a modified version of the well-established Kullback-Leibler (KL) divergence as well as rank-order correlation coefficient may be better suited to capture the mainstreaminess of listeners. We therefore propose two measures adopting KL divergence and rank-order correlation and show, on a real-world dataset of over one billion user-generated listening events (LFM-1b), that music recommender systems can notably benefit when grouping users according to their level of mainstreaminess with respect to these two measures. This particularly holds for the frequently neglected listener group which is characterized by low mainstreaminess.
Article
Full-text available
Rating prediction is to predict the preference rating of a user to an item that she has not rated before. Using the business review data from Yelp, in this paper, we study business rating prediction. A business here can be a restaurant, a shopping mall or other kind of businesses. Different from most other types of items that have been studied in various recommender systems (e.g., movie, song, book), a business physically exists at a geographical location, and most businesses have geographical neighbors within walking distance. When a user visits a business, there is a good chance that she walks by its neighbors. Through data analysis, we observe that there exists weak positive correlation between a business's ratings and its neighbors' ratings, regardless of the categories of businesses. Based on this observation, we assume that a user's rating to a business is determined by both the intrinsic characteristics of the business and the extrinsic characteristics of its geographical neighbors. Using the widely adopted latent factor model for rating prediction, in our proposed solution, we use two kinds of latent factors to model a business: one for its intrinsic characteristics and the other for its extrinsic characteristics. The latter encodes the neighborhood influence of this business to its geographical neighbors. In our experiments, we show that by incorporating geographical neighborhood influences, much lower prediction error is achieved than the state-of-the-art models including Biased MF, SVD++, and Social MF. The prediction error is further reduced by incorporating influences from business category and review content.
Conference Paper
Conventional approaches to matrix factorisation (MF) typically rely on a centralised collection of user data for building a MF model. This approach introduces an increased risk when it comes to user privacy. In this short paper we propose an alternative, user-centric, privacy enhanced, decentralised approach to MF. Our method pushes the computation of the recommendation model to the user's device, and eliminates the need to exchange sensitive personal information; instead only the loss gradients of local (device-based) MF models need to be shared. Moreover, users can select the amount and type of information to be shared, for enhanced privacy. We demonstrate the effectiveness of this approach by considering different levels of user privacy in comparison with state-of-the-art alternatives.
Article
While logistic sigmoid neurons are more biologically plausable that hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-differentiability at zero, creating sparse representations with true zeros, which seem remarkably suitable for naturally sparse data. Even though they can take advantage of semi-supervised setups with extra-unlabelled data, deep rectifier networks can reach their best performance without requiring any unsupervised pre-training on purely supervised tasks with large labelled data sets. Hence, these results can be seen as a new milestone in the attempts at understanding the difficulty in training deep but purely supervised nueral networks, and closing the performance gap between neural networks learnt with and without unsupervised pre-training
Article
We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. Prototypical networks learn a metric space in which classification can be performed by computing Euclidean distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve state-of-the-art results. We provide an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning. We further extend prototypical networks to the case of zero-shot learning and achieve state-of-the-art zero-shot results on the CU-Birds dataset.
Article
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on a few-shot image classification benchmark, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.
Conference Paper
Trust is one source of information that has been widely adopted to personalize online services for users, such as in product recommendations. However, trust information is usually very sparse or unavailable for most online systems. To narrow this gap, we propose a principled approach that predicts implicit trust from users' interactions, by extending a well-known trust antecedents framework. Specifically, we consider both local and global trustworthiness of target users, and form a personalized trust metric by further taking into account the active user's propensity to trust. Experimental results on two real-world datasets show that our approach works better than contemporary counterparts in terms of trust ranking performance when direct user interactions are limited.
Article
The MovieLens datasets are widely used in education, research, and industry. They are downloaded hundreds of thousands of times each year, reflecting their use in popular press programming books, traditional and online courses, and software. These datasets are a product of member activity in the MovieLens movie recommendation system, an active research platform that has hosted many experiments since its launch in 1997. This article documents the history of MovieLens and the MovieLens datasets. We include a discussion of lessons learned from running a long-standing, live research platform from the perspective of a research organization. We document best practices and limitations of using the MovieLens datasets in new research.