Approximate Graph Matching Joseph Lubars and R Srikant
Approximate Graph Matching Joseph Lubars and R. Srikant ECE/CSL UIUC
Problem Statement Given two correlated graphs… One with known node identities, One with unknown (or incorrect) node identities… Goal: Infer the identities of the nodes in the second graph
Problem Given two correlated graphs… One with known node identities, One with unknown (or incorrect) node identities… Goal: Infer the identities of the nodes in the second graph • The two graphs are not identical (Edges 0 -2, 0 -6 exist in the first graph, but not the second)
Computational Complexity Requirement We are interested in very large graphs:
Application 1: Social Networks Friendship Graph Bob Alice Carol Sample edges from an underlying friendship graph to obtain social networks Alice Carol
Application 1: Social Networks ? Bob ? Alice ? Carol Use the graph topology of one social network to deanonymize members of another network
Application 2: Protein Interaction Human Network Mouse Network Q 8 WUU 5 Q 920 S 3 P 06436 P 58391 Q 9 Y 365 P 62805 Q 9 JMD 3 P 62806 Find proteins with similar functions across different species based on the topologies of their interaction networks
Application 3: Wikipedia Articles English Wikipedia French Wikipédia Hydrosphère Hydrosphere Sun Soleil Terre Earth Solar System Supercontinent Système solaire Supercontinent Automatically find or correct corresponding articles in different versions of Wikipedia based on the graph of article links.
Mathematical Model Note: permuting the node identities or giving them different identities, or erasing the node identities, are equivalent Permute node labels of one graph
Prior Results
Permutation Matrices
Mismatch Metric
Convex Relaxation
Seed-Based Approaches (Narayanan-Shmatikov, 2009)
Our Model and Algorithm
Our Model Sample Edges Permute node labels
The Algorithm: Witnesses (Korula-Lattanzi)
MWM on Bipartite Graphs
Step 2: Greedy Matching, instead of MWM
The Algorithm: Interpretation
Why Does Greedy Matching Work?
Why Does Greedy Matching Work?
Why Does Greedy Matching Work?
Simulations • In practice, the algorithm is initialized with a random matching between the two graphs • Suppose 1% of the matches are correct initially, then by running the algorithm once, one may increase this to something larger than 1% • Run it again, increase the number of correct matches • Repeat several times…. • Threshold phenomenon: if the initial number of correct matches is small, doesn’t help; otherwise, can match “all” nodes correctly
Performance on Two Different Graph Models Stochastic Block Model Fraction of initially correct matches Barabási-Albert Model Fraction of initially correct matches
Real-World Graphs (Simulated Sampling) Epinions Social Network Fraction of initially correct matches Slashdot Social Network Fraction of initially correct matches
Seedless Matching In Practice
Our Algorithm Alone for Seedless Matching
Small Number of Seeds (Epinions Network)
Main Theoretical Result
Conclusions • A new algorithm for matching two graphs • Potential Application: Uncover identities of anonymous nodes • Works significantly better than existing approaches, especially for graphs with very large variations in node degrees
- Slides: 32