Graph Homomorphism Revisited for Graph Matching Wenfei Fan

  • Slides: 32
Download presentation
Graph Homomorphism Revisited for Graph Matching Wenfei Fan Shuai Ma Yinghui Wu University of

Graph Homomorphism Revisited for Graph Matching Wenfei Fan Shuai Ma Yinghui Wu University of Edinburgh Jianzhong Li Hongzhi Wang Harbin Institute of Technology 1

Graph Matching: the problem ü Given graphs G 1 and G 2 , decide

Graph Matching: the problem ü Given graphs G 1 and G 2 , decide whether G 1 matches G 2. ü Applications How to define? • Web mirror/Web site classification • Complex object identification • Plagiarism detection • Social matching, key work search, proximity search, web service composition… Widely employed in a variety of emerging real life applications 2

Graph similarity metrics: the state of the art ü Structural-based metrics • Graph homomorphism

Graph similarity metrics: the state of the art ü Structural-based metrics • Graph homomorphism • Subgraph isomorphism • Maximum common subgraph • Edit distance • Graph simulation Capable enough? Identical label matching, edge-to-edge mappings/relations 3

Website matching: real life application A. Home B. Index books audio books textbooks abooks

Website matching: real life application A. Home B. Index books audio books textbooks abooks G 1 sports edge-to-path mappings digital categories booksets school audio books CDs DVDs albums arts G 2 features genres albums Graph homomorphism (subgraph isomorphism) is too restrictive! 5

Outline ü Revisions of graph homomorphism • (1 -1) P-Homomorphism: node similarity, edge-to-path mapping

Outline ü Revisions of graph homomorphism • (1 -1) P-Homomorphism: node similarity, edge-to-path mapping ü Graph matching as optimization problems • Metrics for measuring graph similarity § Maximum cardinality and overall similarity • Optimization problems and complexity results ü Approximating graph matching • Performance guarantees • Approximation algorithm ü Experimental Study ü Conclusion A first step towards revising conventional notions of graph matching 6

Basic notations ü G = (V, E, L) , labeled directed graph ü Similarity

Basic notations ü G = (V, E, L) , labeled directed graph ü Similarity matrix M over G 1 and G 2, a matrix of size |V 1||V 2|, with M(u, v) the similarity score of node u and v. ü Similarity threshold ξ A. home A. Home book B. Index books audio abooks textbook album textbooks album abook albums B. index 0. 7 sports books 1. 0 audiobooks 0. 8 booksets 0. 6 categories arts school albums bookset audiobooks 0. 6 0. 85 digital CD features DVD genres albums Enriched model for capturing semantic similarity 7

P-Homomorphism ü P-homomorphism from G 1 to G 2: a total mapping from V

P-Homomorphism ü P-homomorphism from G 1 to G 2: a total mapping from V 1 to V 2 • preserves node similarity (w. r. t a similarity matrix M and threshold ξ) • map edges to nonempty paths P-hom ? B. index A. home A ü P-homomorphism A B v. s graph C A A A • node audio similarity books book v. s label equality B sports digital E C • edge-to-path mapping v. s edge-to-edge mapping D categories textbook album B D C B bookset D C CD C DVD D school audiobooks genres G 4 features G 3 arts G 1 G 2 P-homomorphism Graph homomorphism is a special case of abook albums 8

1 -1 P-Homomorphism 9 ü G 1 is 1 -1 P-homomorphism to G 2

1 -1 P-Homomorphism 9 ü G 1 is 1 -1 P-homomorphism to G 2 if there exists a 1 -1 (injective) P-homomorphism from G 1 to G 2. • distinct nodes in V 1 have distinct matches in V 2 1 -1 P-hom ? A. home A A AB. index A A ü books 1 -1 P-homomorphism v. s subgraph isomorphism book sports digital audio C B B • node similarity v. s label equality D categories bookset CD v 1 v 2 B • 1 -1 edge-to-path mapping v. s bijective edge-to-edge textbook DVD album D B E mapping abook GG 51 C arts B school. D C C E audiobooks G G 6 2 features genres albums Subgraph isomorphism is a special case of 1 -1 P-homomorphism

Metrics for measuring graph similarity ü Not every node in one graph can find

Metrics for measuring graph similarity ü Not every node in one graph can find P-hom matches in the other graph … ü Maximum cardinality • The cardinality of p-hom mapping from a subgraph G 1’ = (V 1’, E 1’, L 1’) of G 1 to G 2: MCS is a special case of CPH 1 -1 § Card(ρ) = |V 1’|/|V 1| ü The maximum cardinality problem CPH (resp. CPH 1 -1): • Input: two graphs G 1 and G 2 • Output: the P-hom (resp. 1 -1 P-hom) mapping ρ having the maximum Card(ρ). Similarity metric based on the maximum number of nodes 10

Example for CPH 1 -1 A A C B D v 1 E G

Example for CPH 1 -1 A A C B D v 1 E G 5 B ü Maximum cardinality metric : 4/5 = 0. 8 B v 2 D E G 6 11

Metrics for measuring graph similarity (cont. ) ü Overall similarity • The overall similarity

Metrics for measuring graph similarity (cont. ) ü Overall similarity • The overall similarity of p-hom mapping from a subgraph G 1’ of G 1 to G 2: § Sim(ρ) = ∑(w(v 1’) * M(v 1’, ρ(v 1’)) / ∑w(v), v 1’ ∈V 1’, v ∈ V ü Maximum overall similarity SPH (resp. SPH 1 -1): • Input: two graphs G 1 and G 2 • Output: the P-hom (resp. 1 -1 P-hom) mapping ρ having the maximum Sim(ρ). Similarity metric based on overall weighted similarity of nodes 12

Example for CPH and SPH A A B D v 1 E G 5

Example for CPH and SPH A A B D v 1 E G 5 0. 6 B 6 v 2 C ü Maximum overall B similarity metric : (1*1+6*1)/8 = 0. 7 1. 0 D E G 6 13

Complexity results - Intractability ü Intractability • P-Hom and 1 -1 P-Hom are NP-complete.

Complexity results - Intractability ü Intractability • P-Hom and 1 -1 P-Hom are NP-complete. § NP-hard when both G 1 and G 2 are acyclic directed graphs (DAGs) § NP-hard for 1 -1 P-Hom when G 1 is a tree and G 2 is a DAG. § reduction from 3 SAT and X 3 C • The decision problem of CPH, CPH 1 -1, SPH 1 -1 are NPcomplete. § reduction from P-Hom and 1 -1 P-Hom § NP-hard for DAGs P-Hom and 1 -1 P-Hom are intractable. Approximation algorithms? 14

Complexity results – Approximation Hardness ü Approximation hardness • Unless P = NP, CPH

Complexity results – Approximation Hardness ü Approximation hardness • Unless P = NP, CPH 1 -1, SPH 1 -1 are not approximable within O(1/n 1 -ε) for any constant ε, with n the node number of input graphs. • Approximation factor preserving reduction (AFP-reduction) from maximum weighted independent set problem (MWIS) P-Hom and 1 -1 P-Hom are hard to approximate 15

Approximation Algorithms Problems Complexity Approximation complexity P-hom NP-complete ─ 1 -1 P-hom NP-complete ─

Approximation Algorithms Problems Complexity Approximation complexity P-hom NP-complete ─ 1 -1 P-hom NP-complete ─ CPH NP-complete Approximation-hard CPH 1 -1 NP-complete Approximation-hard SPH 1 -1 NP-complete Approximation-hard Approximation bound? ü Given two graphs G 1 = (V 1, E 1, L 1) and G 2 = (V 2, E 2, L 2), CPH, CPH 1 -1, SPH 1 -1 are all approximable within O(log 2 (|V 1||V 2|)/ (|V 1||V 2|)) ü AFP reductions to MWIS problem P-Hom can be solved with a provable performance guarantee 16

Approximation algorithm for CPH ü Algorithm comp. Max. Card(G 1, G 2, M, ξ)

Approximation algorithm for CPH ü Algorithm comp. Max. Card(G 1, G 2, M, ξ) • Input: two graphs G 1 = (V 1, E 1, L 1) and G 2 = (V 2, E 2, L 2), a similarity matrix M, and a similarity threshold ξ • Output: a P-hom mapping from subgraph of G 1 to G 2 • Key ideas § § § initialize matching list for each node in G 1 compute the transitive closure of G 2 Avoid operations on the product graph starting from a match pair, recursively choose and include new matches to the match set until it can no longer be extended, via a greedy strategy. • Complexity: O(| V 1 |3| V 2 |2 + | V 1 || E 1 || V 2 |3) P-Hom problems can be solved with a provable performance guarantee 17

Algorithm comp. Max. Card: running example B. index A. home books book audio categories

Algorithm comp. Max. Card: running example B. index A. home books book audio categories textbook sports bookset digital CD DVD album abook G 1 school art audiobooks features G 2 albums genres 18

Algorithm comp. Max. Card: running example (cont. ) candidate set w. r. t M

Algorithm comp. Max. Card: running example (cont. ) candidate set w. r. t M and ξ B. index A. home books book audio categories textbook sports digital bookset CD DVD album abook G 1 school art audiobooks features G 2 albums genres Step 1: Initialize matching list for each node in G 1 19

Algorithm comp. Max. Card: running example (cont. ) B. index A. home books book

Algorithm comp. Max. Card: running example (cont. ) B. index A. home books book audio categories textbook sports digital bookset CD DVD album abook G 1 school art audiobooks features G 2 albums genres Step 2: Pick a node and select a pair of match 20

Algorithm comp. Max. Card: running example (cont. ) B. index A. home books book

Algorithm comp. Max. Card: running example (cont. ) B. index A. home books book audio categories textbook sports bookset digital CD DVD album abook G 1 school art audiobooks features G 2 albums genres Step 3: recursively expanding matches 21

Algorithm comp. Max. Card: running example (cont. ) B. index A. home books book

Algorithm comp. Max. Card: running example (cont. ) B. index A. home books book audio categories textbook sports bookset digital CD DVD album abook G 1 school art audiobooks features G 2 albums genres Step 3: recursively expanding matches 22

Experimental Study ü Investigate the ability and scalability of the approximation algorithms vs graph

Experimental Study ü Investigate the ability and scalability of the approximation algorithms vs graph simulation, subgraph isomorphism, and vertex similarity ü Datasets • Real-life data: Websites in online stores, international organizations and online newspaper • Synthetic data: graph generator controlled by the number of nodes m, and noise% ü Experimental Setting • Graph Web size. Skeleton Web 1 (α =graphs 0. 2) G(V, E, L) Skeleton 2 (top-20) Sites § Web##graphs and#skeletons of of edges max. Deg(G) of nodes # of edges avg. Deg(G # of nodes # of edges § 1 Synthetic graph size 10, 841 Site 250 Site 1 20, 000 42, 000 • Site Matching threshold: 0. 75 2 44 214 Site 2 5, 400 33, 114 • Site Accuracy and efficiency 3 142 4260 Site 3 7, 000 16, 800 ) 20 4. 20 20 12. 31 20 4. 80 510 644 500 207 20 37 24

Experimental Study (cont. ) our methods find more than 50% of matches Accuracy (%)

Experimental Study (cont. ) our methods find more than 50% of matches Accuracy (%) Algorithms Skeleton 1 (α = 0. 2) Skeleton 2 (top-20) Site 1 Site 2 Site 3 comp. Max. Card 80 100 60 comp. Max. Card 1 -1 40 100 30 80 100 40 comp. Max. Sim 80 100 50 90 100 60 comp. Max. Sim 1 -1 20 80 10 90 100 40 SF 40 30 20 80 80 70 cdk. MCS N/A N/A 67 100 0 more matches and 2 cdk. MCS more matches onthan site SF 1 and than site 3 graph simulation finds no match P-Hom algorithms find more meaningful matches P-Hom algorithms find more matches than 1 -1 P-Hom algorithms 25

Experimental Study (cont. ) Our algorithms took less than 4 seconds Scalability(seconds) Algorithms Skeleton

Experimental Study (cont. ) Our algorithms took less than 4 seconds Scalability(seconds) Algorithms Skeleton 1 (α = 0. 2) Skeleton 2 (top-20) Site 1 Site 2 Site 3 comp. Max. Card 3. 128 0. 108 1. 062 0. 078 0. 066 0. 080 comp. Max. Card 1 -1 2. 847 0. 097 0. 840 0. 054 0. 051 0. 064 comp. Max. Sim 3. 197 0. 093 0. 877 0. 051 0. 062 comp. Max. Sim 1 -1 2. 865 0. 093 0. 850 0. 053 0. 049 0. 039 SF 60. 275 3. 873 7. 812 0. 067 0. 158 0. 121 cdk. MCS N/A N/A 156. 93 189. 16 0. 82 cdk. MCS did not run to completion SF is more sensitive to the graph size P-Hom algorithms are more efficient and robust 26

Experimental Study (cont. ) Accuracy above 65% Insensitive P-Hom algorithms find matches with relatively

Experimental Study (cont. ) Accuracy above 65% Insensitive P-Hom algorithms find matches with relatively high accuracy 27

Experimental Study (cont. ) P-Hom algorithms scale well with the size of graphs 28

Experimental Study (cont. ) P-Hom algorithms scale well with the size of graphs 28

Experimental Study (cont. ) Above 50% Accuracy of P-Hom algorithms is sensitive to the

Experimental Study (cont. ) Above 50% Accuracy of P-Hom algorithms is sensitive to the noise 29

Experimental Study (cont. ) our algorithms are not sensitive to the noise Graph simulation

Experimental Study (cont. ) our algorithms are not sensitive to the noise Graph simulation is sensitive to the noise Efficiency of P-Hom algorithms are not sensitive to the noise 30

Conclusion ü P-homomorphism and 1 -1 P-homomorphism, revisions of graph homomorphism/subgraph isomorphism • node

Conclusion ü P-homomorphism and 1 -1 P-homomorphism, revisions of graph homomorphism/subgraph isomorphism • node similarity, edge-to-path mappings • quantitative metrics to measure graph similarity ü Complexity bounds of decision and optimization problems for P- hom and 1 -1 P-hom • Intractability • Approximation hardness ü Approximation algorithms with performance guarantees on match quality Graph homomorphism revisited for graph matching 31

Future work ü New application areas ü Indexing and filtering techniques ü Comparison of

Future work ü New application areas ü Indexing and filtering techniques ü Comparison of our work with feature-based approaches ü Incremental graph matching problem There is much more to be done 32

Terrorist Collaboration Network (1970 - 2010) “Those who were trained to fly didn’t know

Terrorist Collaboration Network (1970 - 2010) “Those who were trained to fly didn’t know the others. One group of people did not know the other group. ” (Bin Laden) 33

 • Approximation factor preserving reduction (AFP-reduction) from maximum weighted independent set problem (MWIS)

• Approximation factor preserving reduction (AFP-reduction) from maximum weighted independent set problem (MWIS) Problem A x f Problem B f(x) g, α g(x, SB f(x), ε), α(ε) Solution of X SB f(x), ε Solution of f(X) 34