Reconstruction from Randomized Graph via Low Rank Approximation

  • Slides: 25
Download presentation
Reconstruction from Randomized Graph via Low Rank Approximation Leting Wu Xiaowei Ying, Xintao Wu

Reconstruction from Randomized Graph via Low Rank Approximation Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N. C. – Charlotte

Outline �Background & Motivation �Low Rank Approximation on Graph Data �Reconstruction from Randomized Graph

Outline �Background & Motivation �Low Rank Approximation on Graph Data �Reconstruction from Randomized Graph �Evaluation �Privacy Issue 2

Background & Motivation 3

Background & Motivation 3

Background �In the process of publishing/outsourcing network data for mining/analysis, pure anonymization is not

Background �In the process of publishing/outsourcing network data for mining/analysis, pure anonymization is not enough for protecting the privacy due to topology based attacks(Active/passive attacks, subgraph attacks). �Graph Randomization/Perturbation: �Random Add/Del edges (no. of edges unchanged) �Random Switch edges (nodes’ degree unchanged) �Feature preserving randomization �Spectrum preserving randomization �Feature preserving via Markov-chain based graph generation �Clustering --- grouping subgraphs into supernodes 4

Motivation �We focus on whether we can reconstruct a graph from s. t. Our

Motivation �We focus on whether we can reconstruct a graph from s. t. Our Focus 5

Low Rank Approximation on Graph Data 6

Low Rank Approximation on Graph Data 6

Adjacency Matrix & Its Eigen. Decomposition Matrix Representation of Network �Adjacency Matrix A (symmetric)

Adjacency Matrix & Its Eigen. Decomposition Matrix Representation of Network �Adjacency Matrix A (symmetric) �Eigen-decomposition: Questions: �What are their relations with graph topology? 7

Leading Eigenpairs vs. Graph Topology �What are the role of positive and negative eigen-pairs

Leading Eigenpairs vs. Graph Topology �What are the role of positive and negative eigen-pairs in graph topology? �Without loss of generality, we partition the node set into two groups and the adjacency matrix can be partitioned as where and represent the edges within the two groups and represents the edges between the groups 8

Leading Eigenpairs vs. Graph Topology Original r=1 9 r= 2

Leading Eigenpairs vs. Graph Topology Original r=1 9 r= 2

Leading Eigenpairs vs. Graph Topology Origina l r=1 10 r=2

Leading Eigenpairs vs. Graph Topology Origina l r=1 10 r=2

Leading Eigenpairs vs. Graph Topology 11 Origina l r=1 r=2 r=4

Leading Eigenpairs vs. Graph Topology 11 Origina l r=1 r=2 r=4

Low Rank Approximation on Graph Data �Low Rank Approximation: This provide a best r

Low Rank Approximation on Graph Data �Low Rank Approximation: This provide a best r rank approximation to A �To keep the structure of adjacency matrix, discrete 12 as following:

Reconstruction from Randomized Graph 13

Reconstruction from Randomized Graph 13

Reconstructed Features (Political Blogs, Rand Add/Del 40% of Edges) 14

Reconstructed Features (Political Blogs, Rand Add/Del 40% of Edges) 14

Determine Number of Eigen-pairs Question: �How to choose an optimal rank r for reconstruction?

Determine Number of Eigen-pairs Question: �How to choose an optimal rank r for reconstruction? Solution: �Choose as the indicator since it is closely related to the other features and there exists an explicit moment estimator where m is the number of edges, k is the number of edges add/delete, 15

Algorithm 16

Algorithm 16

Evaluation 17

Evaluation 17

Effect of Noise (Political Blogs) �The method works well to a certain level of

Effect of Noise (Political Blogs) �The method works well to a certain level of noise �Even with high level of noise, the reconstructed features are still closer to the original than the randomized ones 18

Reconstructed Features on 3 real network data �Reconstruction Quality �When , the reconstructed features

Reconstructed Features on 3 real network data �Reconstruction Quality �When , the reconstructed features are closer to the original ones than the randomized ones �All positive for the three data sets 19

Privacy Issue 20

Privacy Issue 20

Privacy Issue 21 Normalized F Norm Political Blogs Normalized F Norm Can this reconstructio

Privacy Issue 21 Normalized F Norm Political Blogs Normalized F Norm Can this reconstructio n be used by attackers? � Define the normalized Frobenius distance between A and as Enron Normalized F Norm Political Books � Question 1:

Privacy Issue �Question 2: Which type of graphs would have privacy breached? �For low

Privacy Issue �Question 2: Which type of graphs would have privacy breached? �For low rank graphs which have , the distance between the reconstructed graph and the original graph can be very small 22

Synthetic Low Rank Graphs � Here is a 23 set of synthetic low rank

Synthetic Low Rank Graphs � Here is a 23 set of synthetic low rank graphs generated from Political Blogs and you can see that the reconstructi on works on both the distance

Conclusion �We show the relationship between graph topological structure and eigen-pairs of the adjacency

Conclusion �We show the relationship between graph topological structure and eigen-pairs of the adjacency matrix �We propose a low rank approximation based reconstruction algorithm with a novel solution to determine the optimal rank �For most social networks, our algorithm do not incur further disclosure risks of individual privacy except for networks with low ranks or a small number of dominant eigenvalues 24

Thank You! Questions? Acknowledgments This work was supported in part by U. S. National

Thank You! Questions? Acknowledgments This work was supported in part by U. S. National Science Foundation IIS-0546027 and CNS-0831204. 25 25