TY - GEN
T1 - Equi-width data swapping for private data publication
AU - Li, Yidong
AU - Shen, Hong
PY - 2009
Y1 - 2009
N2 - Data Swapping is a popular value-invariant data perturbation technique. The quality of a data swapping method is measured by how well it preserves data privacy and data utility. As swapping data globally is computationally impractical, to guarantee its performance in these metrics appropriate, localization schemes are often conducted in advance. Equi-depth partitioning is preferred by most of the existing data perturbation techniques as it provides uniform privacy protection for each data tuple. However, this method performs ineffectively for two types of applications: one is to maintain statistics based on equi-width partitioning, such as the multivariate histogram with equal bin width, and the other Is to preserve parametric statistics, such as covariance, in the context of sparse data with non-uniform distribution. As a natural solution for the above application, this paper explores the possibility of using data swapping with equi-width partitioning for private data publication, which has been little used in data perturbation due to the difficulty of preserving data privacy. With extensive theoretical analysis and experimental results, we show that, Equi-Width Swapping (EWS) can achieve a similar performance in privacy preservation to that of Equi-Depth Swapping (EDS) if the number of partitions is sufficiently large (e.g. ≥ √N, where N is the size of dataset). Our experimental results in both synthetic and real-world data validate our theoretical analysis.
AB - Data Swapping is a popular value-invariant data perturbation technique. The quality of a data swapping method is measured by how well it preserves data privacy and data utility. As swapping data globally is computationally impractical, to guarantee its performance in these metrics appropriate, localization schemes are often conducted in advance. Equi-depth partitioning is preferred by most of the existing data perturbation techniques as it provides uniform privacy protection for each data tuple. However, this method performs ineffectively for two types of applications: one is to maintain statistics based on equi-width partitioning, such as the multivariate histogram with equal bin width, and the other Is to preserve parametric statistics, such as covariance, in the context of sparse data with non-uniform distribution. As a natural solution for the above application, this paper explores the possibility of using data swapping with equi-width partitioning for private data publication, which has been little used in data perturbation due to the difficulty of preserving data privacy. With extensive theoretical analysis and experimental results, we show that, Equi-Width Swapping (EWS) can achieve a similar performance in privacy preservation to that of Equi-Depth Swapping (EDS) if the number of partitions is sufficiently large (e.g. ≥ √N, where N is the size of dataset). Our experimental results in both synthetic and real-world data validate our theoretical analysis.
UR - http://www.scopus.com/inward/record.url?scp=77950983950&partnerID=8YFLogxK
U2 - 10.1109/PDCAT.2009.69
DO - 10.1109/PDCAT.2009.69
M3 - Conference contribution
AN - SCOPUS:77950983950
SN - 9780769539140
T3 - Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings
SP - 231
EP - 238
BT - 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2009
T2 - 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2009
Y2 - 8 December 2009 through 11 December 2009
ER -