Copy reference, caption or embed code

Figure 2 - An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge (Dissertation)

Figure 2.9: An illustration of data swapping; statistical traits are kept. Synthetic data sets Synthetic data generation is another SDC method in which an original set of tuples is replaced with a new set of look-alike tuples but while still preserving the statistical properties of the original data values (Ciriani et al., 2007). Synthetic data generation falls in two major categories, fully synthetic and partially synthetic. Proposed by Rubin (1993) fully synthetic datasets are unreal or pseudo datasets created by replacing values in the original dataset with imputed unknown data values that retain the same statistical characteristics as in the original dataset but totally hide any sensitive or private information (Rubin, 1993) (Reiter, 2002). On the other hand, Little (1993) proposed a different approach, rather than replace all values in the dataset with synthetic data, partially synthetic datasets are generated in which only sensitive values are replaced with unreal or pseudo values to enhance confidentiality (Little, 1993). However, Drechsler, 
9: An illustration of data swapping; statistical traits are kept. Synthetic data sets Synthetic data generation is another SDC method in which an original set of tuples is replaced with a new set of look-alike tuples but while still preserving the statistical properties of the original data values (Ciriani et al., 2007). Synthetic data generation falls in two major categories, fully synthetic and partially synthetic. Proposed by Rubin (1993) fully synthetic datasets are unreal or pseudo datasets created by replacing values in the original dataset with imputed unknown data values that retain the same statistical characteristics as in the original dataset but totally hide any sensitive or private information (Rubin, 1993) (Reiter, 2002). On the other hand, Little (1993) proposed a different approach, rather than replace all values in the dataset with synthetic data, partially synthetic datasets are generated in which only sensitive values are replaced with unreal or pseudo values to enhance confidentiality (Little, 1993). However, Drechsler, 
Go to figure page
Reference
Caption
Embed code