Reconstruction from Randomized Graph via Low Rank Approximation

























- Slides: 25
Reconstruction from Randomized Graph via Low Rank Approximation Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N. C. – Charlotte
Outline �Background & Motivation �Low Rank Approximation on Graph Data �Reconstruction from Randomized Graph �Evaluation �Privacy Issue 2
Background & Motivation 3
Background �In the process of publishing/outsourcing network data for mining/analysis, pure anonymization is not enough for protecting the privacy due to topology based attacks(Active/passive attacks, subgraph attacks). �Graph Randomization/Perturbation: �Random Add/Del edges (no. of edges unchanged) �Random Switch edges (nodes’ degree unchanged) �Feature preserving randomization �Spectrum preserving randomization �Feature preserving via Markov-chain based graph generation �Clustering --- grouping subgraphs into supernodes 4
Motivation �We focus on whether we can reconstruct a graph from s. t. Our Focus 5
Low Rank Approximation on Graph Data 6
Adjacency Matrix & Its Eigen. Decomposition Matrix Representation of Network �Adjacency Matrix A (symmetric) �Eigen-decomposition: Questions: �What are their relations with graph topology? 7
Leading Eigenpairs vs. Graph Topology �What are the role of positive and negative eigen-pairs in graph topology? �Without loss of generality, we partition the node set into two groups and the adjacency matrix can be partitioned as where and represent the edges within the two groups and represents the edges between the groups 8
Leading Eigenpairs vs. Graph Topology Original r=1 9 r= 2
Leading Eigenpairs vs. Graph Topology Origina l r=1 10 r=2
Leading Eigenpairs vs. Graph Topology 11 Origina l r=1 r=2 r=4
Low Rank Approximation on Graph Data �Low Rank Approximation: This provide a best r rank approximation to A �To keep the structure of adjacency matrix, discrete 12 as following:
Reconstruction from Randomized Graph 13
Reconstructed Features (Political Blogs, Rand Add/Del 40% of Edges) 14
Determine Number of Eigen-pairs Question: �How to choose an optimal rank r for reconstruction? Solution: �Choose as the indicator since it is closely related to the other features and there exists an explicit moment estimator where m is the number of edges, k is the number of edges add/delete, 15
Algorithm 16
Evaluation 17
Effect of Noise (Political Blogs) �The method works well to a certain level of noise �Even with high level of noise, the reconstructed features are still closer to the original than the randomized ones 18
Reconstructed Features on 3 real network data �Reconstruction Quality �When , the reconstructed features are closer to the original ones than the randomized ones �All positive for the three data sets 19
Privacy Issue 20
Privacy Issue 21 Normalized F Norm Political Blogs Normalized F Norm Can this reconstructio n be used by attackers? � Define the normalized Frobenius distance between A and as Enron Normalized F Norm Political Books � Question 1:
Privacy Issue �Question 2: Which type of graphs would have privacy breached? �For low rank graphs which have , the distance between the reconstructed graph and the original graph can be very small 22
Synthetic Low Rank Graphs � Here is a 23 set of synthetic low rank graphs generated from Political Blogs and you can see that the reconstructi on works on both the distance
Conclusion �We show the relationship between graph topological structure and eigen-pairs of the adjacency matrix �We propose a low rank approximation based reconstruction algorithm with a novel solution to determine the optimal rank �For most social networks, our algorithm do not incur further disclosure risks of individual privacy except for networks with low ranks or a small number of dominant eigenvalues 24
Thank You! Questions? Acknowledgments This work was supported in part by U. S. National Science Foundation IIS-0546027 and CNS-0831204. 25 25