TY - JOUR
T1 - Effective reconstruction of data perturbed by random projections
AU - Sang, Yingpeng
AU - Shen, Hong
AU - Tian, Hui
N1 - Funding Information:
This work was done under the support of National Science Foundation of China grants No. 61170232 and No. 61100218, the Fundamental Research Funds for the Central Universities, Australian Research Council Discovery Project grant DP0985063, and University of Adelaide ECMS Research Group of Networks, Parallel and Distributed Systems. The corresponding author is Hong Shen.
PY - 2012
Y1 - 2012
N2 - Random Projection (RP) has raised great concern among the research community of privacy-preserving data mining, due to its high efficiency and utility, e.g., keeping the euclidean distances among the data points. It was shown in [33] that, if the original data set composed of m attributes is multiplied by a mixing matrix of k × m(m > k) which is random and orthogonal on expectation, then the k series of perturbed data can be released for mining purposes. Given the data perturbed by RP and some necessary prior knowledge, to our knowledge, little work has been done in reconstructing the original data to recover some sensitive information. In this paper, we choose several typical scenarios in data mining with different assumptions on prior knowledge. For the cases that an attacker has full or zero knowledge of the mixing matrix R, respectively, we propose reconstruction methods based on Underdetermined Independent Component Analysis (UICA) if the attributes of the original data are mutually independent and sparse, and propose reconstruction methods based on Maximum A Posteriori (MAP) if the attributes of the original data are correlated and nonsparse. Simulation results show that our reconstructions achieve high recovery rates, and outperform the reconstructions based on Principal Component Analysis (PCA). Successful reconstructions essentially mean the leakage of privacy, so our work identify the possible risks of RP when it is used for data perturbations.
AB - Random Projection (RP) has raised great concern among the research community of privacy-preserving data mining, due to its high efficiency and utility, e.g., keeping the euclidean distances among the data points. It was shown in [33] that, if the original data set composed of m attributes is multiplied by a mixing matrix of k × m(m > k) which is random and orthogonal on expectation, then the k series of perturbed data can be released for mining purposes. Given the data perturbed by RP and some necessary prior knowledge, to our knowledge, little work has been done in reconstructing the original data to recover some sensitive information. In this paper, we choose several typical scenarios in data mining with different assumptions on prior knowledge. For the cases that an attacker has full or zero knowledge of the mixing matrix R, respectively, we propose reconstruction methods based on Underdetermined Independent Component Analysis (UICA) if the attributes of the original data are mutually independent and sparse, and propose reconstruction methods based on Maximum A Posteriori (MAP) if the attributes of the original data are correlated and nonsparse. Simulation results show that our reconstructions achieve high recovery rates, and outperform the reconstructions based on Principal Component Analysis (PCA). Successful reconstructions essentially mean the leakage of privacy, so our work identify the possible risks of RP when it is used for data perturbations.
KW - Maximum A Posteriori
KW - Privacy-preserving data mining
KW - data perturbation
KW - data reconstruction
KW - principal component analysis
KW - underdetermined independent component analysis
UR - http://www.scopus.com/inward/record.url?scp=82555168400&partnerID=8YFLogxK
U2 - 10.1109/TC.2011.83
DO - 10.1109/TC.2011.83
M3 - Article
AN - SCOPUS:82555168400
SN - 0018-9340
VL - 61
SP - 101
EP - 117
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 1
M1 - 6085626
ER -