Table 3 - uploaded by Victor E Lee
Content may be subject to copyright.
RMSE VS k 

RMSE VS k 

Source publication
Conference Paper
Full-text available
For datasets in Collaborative Filtering (CF) recommen-dations, even if the identifier is deleted and some triv-ial perturbation operations are applied to ratings before they are released, there are research results claiming that the adversary could discriminate the individual's iden-tity with a little bit of information. In this paper, we propose k...

Contexts in source publication

Context 1
... results of the first part are shown in Table 3. The experiments were run on the datasets, Netflix, Epinions, Movielens I and Movielens II, respectively. ...
Context 2
... techniques were used as the filling methods. They are trust-based pre- diction (trusted k-coRating), pearson correlation similarity prediction (sim k-coRating), average rating values (average k-coRating) and random rating values (random k-coRating), which correspond to the four columns in each multi-column of Table 3: trusted, sim, average and random, respectively. ...
Context 3
... results in Table 3 show that not all data filling meth- ods guarantee better accuracy: trusted k-coRating and sim k-coRating perform best, better than the baseline for most k values; random k-coRating achieves the worst accuracy, worse than the baseline; while average k-coRating achieves the average accuracy, better than the random one but worse than its trusted and sim counterparts. This is the reason why generating "well-predicted" ratings is important. ...
Context 4
... a summary of k-coRating, Table 3 shows that, with well-predicted filling values, it may provide a better recom- mendation accuracy while varying the values of k. Fig. 3 shows us it is able to greatly reduce the risks of suffering 9 supp(i) is the number of subscribers who have rated item i; ⇢i is the rating of item i in the auxiliary information, aux, i.e., the information that the adversary knows and ⇢ 0 i is the rating of item i in the candidate record r 0 , r 0 2 D. ...

Similar publications

Article
Full-text available
Data reusability has become a distinct characteristic of scientific, commercial, and administrative practices nowadays. However, an unlimited and careless reuse of data may lead to privacy breaches and unfair impacts on individuals and vulnerable groups. Data content adaption is a key aspect of preserving data privacy and fairness. Often, such cont...
Article
Full-text available
In the era of the Internet of Things (IoT), drug developers can potentially access a wealth of real-world, participant-generated data that enable better insights and streamlined clinical trial processes. Protection of confidential data is of primary interest when it comes to health data, as medical condition influences daily, professional, and soci...
Article
Full-text available
To systematically evaluate the privacy protection performance of synthetic data generation algorithms (Synthpop, CTGAN, RTVAE, TVAE, DataSynthesizer), this study applied various safety metrics. Synthetic data is designed to protect sensitive information while maintaining statistical similarities to the original data, but a high degree of similarity...
Article
Full-text available
Objectives: Privacy and accuracy are always trade offfactors in the field of data publishing. Ideally both the factors are considered critical for data handling. Privacy loss and accuracy loss need to be maintained low as possible for an efficient data handling system. Authors have come up with various data publishing techniques aiming to achieve b...
Preprint
Full-text available
Obfuscating a dataset by adding random noises to protect the privacy of sensitive samples in the training dataset is crucial to prevent data leakage to untrusted parties for edge applications. We conduct comprehensive experiments to investigate how the dataset obfuscation can affect the resultant model weights - in terms of the model accuracy, Frob...

Citations

... In data obfuscation, fake or general data are used to obfuscate the service request data related to users' sensitive preferences, which is generally used to protect behavior privacy (Zhang, Lee, and Jin 2014;Chen et al. 2011;Shou et al. 2012). For example, aiming at the applications of web search and browsing, researchers (Wu et al. 2018a(Wu et al. , 2018bShou et al. 2012) proposed some specific obfuscation-based methods. ...
Article
In a digital library, an increasingly important problem is how to prevent the exposure of user privacy in an untrusted network. This study aims to design an effective approach for the protection of user privacy in a digital library, by consulting the basic ideas of encryption and anonymization. In our proposed approach, any privacy data, which can identify user’s real identity, should be encrypted first before being submitted to the library server, to achieve anonymization of user identity. Then, to solve the problem of querying encrypted privacy data, additional feature data are constructed for the encrypted data, such that much of the query processing can be completed at the server-side, without decrypting the data, thereby improving the efficiency of each kind of user query operation. Both theoretical analysis and experimental evaluation demonstrate the effectiveness of the approach, which can improve the security of users’ data privacy and behavior privacy on the untrusted server-side, without compromising the availability (i. e. accuracy, efficiency, and usability) of digital library services. This paper provides a valuable study attempt at the protection of digital library users’ privacy, which has a positive influence on the development of a privacy-preserving library in an untrusted network environment.
... (ii) In a transformation method, the service request data related to users' sensitive preferences would be replaced by fake or general data (Chen, He, & Shou, 2011;Shou, Bai, & Chen, 2012;Zhang, Lee, & Jin, 2014). However, this kind of method certainly would reduce the service accuracy due to the change of the users' request data, that is, its privacy protection would result in a compromise on accuracy; thus, it is difficult to satisfy the practical requirement of digital libraries. ...
... More importantly, it EL requires not only the support of additional hardware and algorithms but also the change to the existing service algorithm on the server, resulting in the change to the platform architecture (i.e. a compromise to the usability), and hence decreasing the availability of the method in digital library environments. The basic idea of data transformation is to leverage dummy or general data to obfuscate the behavior data related to users' sensitive preferences (Zhang et al., 2014). For example, aiming at the applications of personalized search, researchers (Chen et al., 2011;Shou et al., 2012) proposed many specific transformation methods, whose basic idea is to use generalized preferences to replace specific ones, so as to user preference privacy. ...
Article
Purpose The problem of privacy protection in digital libraries is causing people to have increasingly extensive concerns. This study aims to design an approach to protect the preference privacy behind users’ book browsing behaviors in a digital library. Design/methodology/approach This paper proposes a client-based approach, whose basic idea is to construct a group of plausible book browsing dummy behaviors, and submit them together with users’ true behaviors to the untrusted server, to cover up users’ sensitive preferences. Findings Both security analysis and evaluation experiment demonstrate the effectiveness of the approach, which can ensure the privacy security of users’ book browsing preferences on the untrusted digital library server, without compromising the usability, accuracy and efficiency of book services. Originality/value To the best of the authors’ knowledge, this paper provides the first attempt to the protection of users’ behavior privacy in digital libraries, which will have a positive influence on the development of privacy-preserving libraries in the new network era.
... We will show how our algorithm theoretically satisfies DP and how it empirically protects against NS Attack . In addition, we will show how our algorithm works together with the k -coRating model [40,42] to enhance data utility. ...
... Recently, Zhang et al. introduced a seemly naive but effective privacy-preserving model, k -coRating [40,42] . k -coRating generates predictions to fill up the significant NULL cells to achieve two seemly conflicting goals simultaneously: alleviating data sparsity to obtain higher prediction accuracy, and hiding the actual ratings from being identified. ...
... We copy Definitions (1) and (2) in [42] to introduce the core idea of k -coRating. ...
... This kind of techniques might lead to poor recommendation accuracy due to its change to user preference profiles. (2) In data transformation techniques, users' personal data need to be transformed (e.g., using noise addition or data perturbation) [18], [19], [20], before being used for personalized recommendation. Generally, this kind of techniques can only be applied to collaborative filtering algorithms. ...
... In data transformation techniques, users' personal data need to be transformed (e.g., by noise addition or data perturbation) [18], [19], [20], before being used for personalized recommendation. Generally, this kind of techniques can only be applied to collaborative filtering algorithms. ...
... Thus, RPT can ensure not only the security of user privacy, but also the recommendation accuracy. A similar method is proposed in [19] to protect the personal privacy of data mining. The paper [21] designs a collaborative filtering recommendation system based on the discrete wavelet transform (DWT) technique and random perturbation technique. ...
Article
Personalized recommendation has demonstrated its effectiveness in improving the problem of information overload on the Internet. However, evidences show that due to the concerns of personal privacy, users' reluctance to disclose their personal information has become a major barrier for the development of personalized recommendation. In this paper, we propose to generate a group of fake preference profiles, so as to cover up the user sensitive subjects, and thus protect user personal privacy in personalized recommendation. First, we present a client-based framework for user privacy protection, which requires not only no change to existing recommendation algorithms, but also no compromise to the recommendation accuracy. Second, based on the framework, we introduce a privacy protection model, which formulates the two requirements that ideal fake preference profiles should satisfy: (1) the similarity of feature distribution, which measures the effectiveness of fake preference profiles to hide a genuine user preference profile; and (2) the exposure degree of sensitive subjects, which measures the effectiveness of fake preference profiles to cover up the sensitive subjects. Finally, based on a subject repository of product classification, we present an implementation algorithm to well meet the privacy protection model. Both theoretical analysis and experimental evaluation demonstrate the effectiveness of our proposed approach.
... Recommender systems are becoming more and more popular with the onset of the World Wide Web and big data. Designing recommender systems requires overcoming challenges of accuracy, scalability, cold start, and privacy [19][20][21][22]. ...
Article
Before deploying a recommender system, its performance must be measured and understood. So evaluation is an integral part of the process to design and implement recommender systems. In collaborative filtering, there are many metrics for evaluating recommender systems. Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are among the most important and representative ones. To calculate MAE/RMSE, predicted ratings are compared with their corresponding true ratings. To predict item ratings, similarities between active users and their candidate neighbors need to be calculated. The complexity for the traditional and naive similarity calculation corresponding to user u and user v is quadratic in the number of items rated by u and v. In this paper, we explore the mathematical regularities underlying similarity formulas, introduce a novel data structure, and design linear time algorithms to calculate the similarities. Such complexity improvement shortens the evaluation time and will finally contribute to increasing the efficiency of design and development of recommender systems. Experimental results confirm the claim.
Preprint
Collaborative filtering recommendation systems provide recommendations to users based on their own past preferences, as well as those of other users who share similar interests. The use of recommendation systems has grown widely in recent years, helping people choose which movies to watch, books to read, and items to buy. However, users are often concerned about their privacy when using such systems, and many users are reluctant to provide accurate information to most online services. Privacy-preserving collaborative filtering recommendation systems aim to provide users with accurate recommendations while maintaining certain guarantees about the privacy of their data. This survey examines the recent literature in privacy-preserving collaborative filtering, providing a broad perspective of the field and classifying the key contributions in the literature using two different criteria: the type of vulnerability they address and the type of approach they use to solve it.
Chapter
In recommender systems, collaborative filtering (CF) techniques are becoming increasingly popular with the evolution of the Internet. Such techniques are based on filtering or evaluating items through the opinions of online consumers. They use patterns learned from their behavior or preferences to make recommendation. In this context, it is of great importance to protect users’ privacy when there is a need to publish data for a specific purpose which conduct to the usefulness of collaborative recommender systems. However, too much protection to individual privacy will lead to the loss of data utility. How to balance between privacy and utility is challenging. In this paper, we propose a privacy-preserving method based on k-means and k-coRating privacy-preserving model. First, we evaluate the k-coRated model by privacy and utility. Then, according to the drawbacks of it, we introduce our solutions to address the problem. Finally, we make a comparison between our model and k-coRated model in different aspects. As a result, our model outperforms k-coRated model with respect to utility as well as privacy.