Proceeding of the ACM First International Workshop on Privacy and Anonymity for Very Large Databases, CIKM-PAVLAD 2009, Hong Kong, China, November 6, 2009; 01/2009
ABSTRACT: In the era of data explosion, privacy preserving has become a necessary task for any data mining task. Therefore, data transformation to ensure privacy preservation is needed. Meanwhile, the transformed data must have quality to be used in the intended data mining task, i.e. the impact on the data quality with regard to the data mining task must be minimized. However, the data transformation problem to preserve the data privacy while minimizing the impact has been proven as an NP-hard. Also, for classification mining, each classification approach may use different approach to deliver knowledge. Therefore, data quality metric for the classification task should be tailored to a specific type of classification. In this paper, we focus on maintaining the data quality in the scenarios which the transformed data will be used to build associative classification models. We propose a data quality metric for such the associative classification. Also, we propose a heuristic approach to preserve the privacy and maintain the data quality. Subsequently, we validate our proposed approaches with experiments.
Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2008. SNPD '08. Ninth ACIS International Conference on; 09/2008
ABSTRACT: When a data mining model is to be developed, one of the most important issues is preserving the privacy of the input data.
In this paper, we address the problem of data transformation to preserve the privacy with regard to a data mining technique,
associative classification, in an incremental-data scenario. We propose an incremental polynomial-time algorithm to transform
the data to meet a privacy standard, i.e. k-Anonymity. While the transformation can still preserve the quality to build the associative classification model. The computational
complexity of the proposed incremental algorithm ranges from O(nlogn) to O( Δn) depending on the characteristic of increment data. The experiments have been conducted to evaluate the proposed work comparing
with a non-incremental algorithm. From the experiment result, the proposed incremental algorithm is more efficient in every
01/1970: pages 54-63;