Graph Homomorphism Revisited for Graph Matching Wenfei Fan

Graph Matching: the problem ü Given graphs G 1 and G 2 , decide

Graph similarity metrics: the state of the art ü Structural-based metrics • Graph homomorphism

Website matching: real life application A. Home B. Index books audio books textbooks abooks

Outline ü Revisions of graph homomorphism • (1 -1) P-Homomorphism: node similarity, edge-to-path mapping

Basic notations ü G = (V, E, L) , labeled directed graph ü Similarity

P-Homomorphism ü P-homomorphism from G 1 to G 2: a total mapping from V

1 -1 P-Homomorphism 9 ü G 1 is 1 -1 P-homomorphism to G 2

Metrics for measuring graph similarity ü Not every node in one graph can find

Metrics for measuring graph similarity (cont. ) ü Overall similarity • The overall similarity

Example for CPH and SPH A A B D v 1 E G 5

Complexity results - Intractability ü Intractability • P-Hom and 1 -1 P-Hom are NP-complete.

Complexity results – Approximation Hardness ü Approximation hardness • Unless P = NP, CPH

Approximation Algorithms Problems Complexity Approximation complexity P-hom NP-complete ─ 1 -1 P-hom NP-complete ─

Approximation algorithm for CPH ü Algorithm comp. Max. Card(G 1, G 2, M, ξ)

Algorithm comp. Max. Card: running example B. index A. home books book audio categories

Algorithm comp. Max. Card: running example (cont. ) candidate set w. r. t M

Algorithm comp. Max. Card: running example (cont. ) B. index A. home books book

Experimental Study ü Investigate the ability and scalability of the approximation algorithms vs graph

Experimental Study (cont. ) our methods find more than 50% of matches Accuracy (%)

Experimental Study (cont. ) Our algorithms took less than 4 seconds Scalability(seconds) Algorithms Skeleton

Experimental Study (cont. ) Accuracy above 65% Insensitive P-Hom algorithms find matches with relatively

Experimental Study (cont. ) P-Hom algorithms scale well with the size of graphs 28

Experimental Study (cont. ) Above 50% Accuracy of P-Hom algorithms is sensitive to the

Experimental Study (cont. ) our algorithms are not sensitive to the noise Graph simulation

Conclusion ü P-homomorphism and 1 -1 P-homomorphism, revisions of graph homomorphism/subgraph isomorphism • node

Future work ü New application areas ü Indexing and filtering techniques ü Comparison of

Terrorist Collaboration Network (1970 - 2010) “Those who were trained to fly didn’t know

• Approximation factor preserving reduction (AFP-reduction) from maximum weighted independent set problem (MWIS)

Slides: 32

Download presentation

Graph Homomorphism Revisited for Graph Matching Wenfei Fan Shuai Ma Yinghui Wu University of Edinburgh Jianzhong Li Hongzhi Wang Harbin Institute of Technology 1

Graph Matching: the problem ü Given graphs G 1 and G 2 , decide whether G 1 matches G 2. ü Applications How to define? • Web mirror/Web site classification • Complex object identification • Plagiarism detection • Social matching, key work search, proximity search, web service composition… Widely employed in a variety of emerging real life applications 2

Graph similarity metrics: the state of the art ü Structural-based metrics • Graph homomorphism • Subgraph isomorphism • Maximum common subgraph • Edit distance • Graph simulation Capable enough? Identical label matching, edge-to-edge mappings/relations 3

Website matching: real life application A. Home B. Index books audio books textbooks abooks G 1 sports edge-to-path mappings digital categories booksets school audio books CDs DVDs albums arts G 2 features genres albums Graph homomorphism (subgraph isomorphism) is too restrictive! 5

Outline ü Revisions of graph homomorphism • (1 -1) P-Homomorphism: node similarity, edge-to-path mapping ü Graph matching as optimization problems • Metrics for measuring graph similarity § Maximum cardinality and overall similarity • Optimization problems and complexity results ü Approximating graph matching • Performance guarantees • Approximation algorithm ü Experimental Study ü Conclusion A first step towards revising conventional notions of graph matching 6

Basic notations ü G = (V, E, L) , labeled directed graph ü Similarity matrix M over G 1 and G 2, a matrix of size |V 1||V 2|, with M(u, v) the similarity score of node u and v. ü Similarity threshold ξ A. home A. Home book B. Index books audio abooks textbook album textbooks album abook albums B. index 0. 7 sports books 1. 0 audiobooks 0. 8 booksets 0. 6 categories arts school albums bookset audiobooks 0. 6 0. 85 digital CD features DVD genres albums Enriched model for capturing semantic similarity 7

P-Homomorphism ü P-homomorphism from G 1 to G 2: a total mapping from V 1 to V 2 • preserves node similarity (w. r. t a similarity matrix M and threshold ξ) • map edges to nonempty paths P-hom ? B. index A. home A ü P-homomorphism A B v. s graph C A A A • node audio similarity books book v. s label equality B sports digital E C • edge-to-path mapping v. s edge-to-edge mapping D categories textbook album B D C B bookset D C CD C DVD D school audiobooks genres G 4 features G 3 arts G 1 G 2 P-homomorphism Graph homomorphism is a special case of abook albums 8

1 -1 P-Homomorphism 9 ü G 1 is 1 -1 P-homomorphism to G 2 if there exists a 1 -1 (injective) P-homomorphism from G 1 to G 2. • distinct nodes in V 1 have distinct matches in V 2 1 -1 P-hom ? A. home A A AB. index A A ü books 1 -1 P-homomorphism v. s subgraph isomorphism book sports digital audio C B B • node similarity v. s label equality D categories bookset CD v 1 v 2 B • 1 -1 edge-to-path mapping v. s bijective edge-to-edge textbook DVD album D B E mapping abook GG 51 C arts B school. D C C E audiobooks G G 6 2 features genres albums Subgraph isomorphism is a special case of 1 -1 P-homomorphism

Metrics for measuring graph similarity ü Not every node in one graph can find P-hom matches in the other graph … ü Maximum cardinality • The cardinality of p-hom mapping from a subgraph G 1’ = (V 1’, E 1’, L 1’) of G 1 to G 2: MCS is a special case of CPH 1 -1 § Card(ρ) = |V 1’|/|V 1| ü The maximum cardinality problem CPH (resp. CPH 1 -1): • Input: two graphs G 1 and G 2 • Output: the P-hom (resp. 1 -1 P-hom) mapping ρ having the maximum Card(ρ). Similarity metric based on the maximum number of nodes 10

Example for CPH 1 -1 A A C B D v 1 E G 5 B ü Maximum cardinality metric : 4/5 = 0. 8 B v 2 D E G 6 11

Metrics for measuring graph similarity (cont. ) ü Overall similarity • The overall similarity of p-hom mapping from a subgraph G 1’ of G 1 to G 2: § Sim(ρ) = ∑(w(v 1’) * M(v 1’, ρ(v 1’)) / ∑w(v), v 1’ ∈V 1’, v ∈ V ü Maximum overall similarity SPH (resp. SPH 1 -1): • Input: two graphs G 1 and G 2 • Output: the P-hom (resp. 1 -1 P-hom) mapping ρ having the maximum Sim(ρ). Similarity metric based on overall weighted similarity of nodes 12

Example for CPH and SPH A A B D v 1 E G 5 0. 6 B 6 v 2 C ü Maximum overall B similarity metric : (1*1+6*1)/8 = 0. 7 1. 0 D E G 6 13

Complexity results - Intractability ü Intractability • P-Hom and 1 -1 P-Hom are NP-complete. § NP-hard when both G 1 and G 2 are acyclic directed graphs (DAGs) § NP-hard for 1 -1 P-Hom when G 1 is a tree and G 2 is a DAG. § reduction from 3 SAT and X 3 C • The decision problem of CPH, CPH 1 -1, SPH 1 -1 are NPcomplete. § reduction from P-Hom and 1 -1 P-Hom § NP-hard for DAGs P-Hom and 1 -1 P-Hom are intractable. Approximation algorithms? 14

Complexity results – Approximation Hardness ü Approximation hardness • Unless P = NP, CPH 1 -1, SPH 1 -1 are not approximable within O(1/n 1 -ε) for any constant ε, with n the node number of input graphs. • Approximation factor preserving reduction (AFP-reduction) from maximum weighted independent set problem (MWIS) P-Hom and 1 -1 P-Hom are hard to approximate 15

Approximation Algorithms Problems Complexity Approximation complexity P-hom NP-complete ─ 1 -1 P-hom NP-complete ─ CPH NP-complete Approximation-hard CPH 1 -1 NP-complete Approximation-hard SPH 1 -1 NP-complete Approximation-hard Approximation bound? ü Given two graphs G 1 = (V 1, E 1, L 1) and G 2 = (V 2, E 2, L 2), CPH, CPH 1 -1, SPH 1 -1 are all approximable within O(log 2 (|V 1||V 2|)/ (|V 1||V 2|)) ü AFP reductions to MWIS problem P-Hom can be solved with a provable performance guarantee 16

Approximation algorithm for CPH ü Algorithm comp. Max. Card(G 1, G 2, M, ξ) • Input: two graphs G 1 = (V 1, E 1, L 1) and G 2 = (V 2, E 2, L 2), a similarity matrix M, and a similarity threshold ξ • Output: a P-hom mapping from subgraph of G 1 to G 2 • Key ideas § § § initialize matching list for each node in G 1 compute the transitive closure of G 2 Avoid operations on the product graph starting from a match pair, recursively choose and include new matches to the match set until it can no longer be extended, via a greedy strategy. • Complexity: O(| V 1 |3| V 2 |2 + | V 1 || E 1 || V 2 |3) P-Hom problems can be solved with a provable performance guarantee 17

Algorithm comp. Max. Card: running example B. index A. home books book audio categories textbook sports bookset digital CD DVD album abook G 1 school art audiobooks features G 2 albums genres 18

Algorithm comp. Max. Card: running example (cont. ) candidate set w. r. t M and ξ B. index A. home books book audio categories textbook sports digital bookset CD DVD album abook G 1 school art audiobooks features G 2 albums genres Step 1: Initialize matching list for each node in G 1 19

Algorithm comp. Max. Card: running example (cont. ) B. index A. home books book audio categories textbook sports digital bookset CD DVD album abook G 1 school art audiobooks features G 2 albums genres Step 2: Pick a node and select a pair of match 20

Algorithm comp. Max. Card: running example (cont. ) B. index A. home books book audio categories textbook sports bookset digital CD DVD album abook G 1 school art audiobooks features G 2 albums genres Step 3: recursively expanding matches 21

Experimental Study ü Investigate the ability and scalability of the approximation algorithms vs graph simulation, subgraph isomorphism, and vertex similarity ü Datasets • Real-life data: Websites in online stores, international organizations and online newspaper • Synthetic data: graph generator controlled by the number of nodes m, and noise% ü Experimental Setting • Graph Web size. Skeleton Web 1 (α =graphs 0. 2) G(V, E, L) Skeleton 2 (top-20) Sites § Web##graphs and#skeletons of of edges max. Deg(G) of nodes # of edges avg. Deg(G # of nodes # of edges § 1 Synthetic graph size 10, 841 Site 250 Site 1 20, 000 42, 000 • Site Matching threshold: 0. 75 2 44 214 Site 2 5, 400 33, 114 • Site Accuracy and efficiency 3 142 4260 Site 3 7, 000 16, 800 ) 20 4. 20 20 12. 31 20 4. 80 510 644 500 207 20 37 24

Experimental Study (cont. ) our methods find more than 50% of matches Accuracy (%) Algorithms Skeleton 1 (α = 0. 2) Skeleton 2 (top-20) Site 1 Site 2 Site 3 comp. Max. Card 80 100 60 comp. Max. Card 1 -1 40 100 30 80 100 40 comp. Max. Sim 80 100 50 90 100 60 comp. Max. Sim 1 -1 20 80 10 90 100 40 SF 40 30 20 80 80 70 cdk. MCS N/A N/A 67 100 0 more matches and 2 cdk. MCS more matches onthan site SF 1 and than site 3 graph simulation finds no match P-Hom algorithms find more meaningful matches P-Hom algorithms find more matches than 1 -1 P-Hom algorithms 25

Experimental Study (cont. ) Our algorithms took less than 4 seconds Scalability(seconds) Algorithms Skeleton 1 (α = 0. 2) Skeleton 2 (top-20) Site 1 Site 2 Site 3 comp. Max. Card 3. 128 0. 108 1. 062 0. 078 0. 066 0. 080 comp. Max. Card 1 -1 2. 847 0. 097 0. 840 0. 054 0. 051 0. 064 comp. Max. Sim 3. 197 0. 093 0. 877 0. 051 0. 062 comp. Max. Sim 1 -1 2. 865 0. 093 0. 850 0. 053 0. 049 0. 039 SF 60. 275 3. 873 7. 812 0. 067 0. 158 0. 121 cdk. MCS N/A N/A 156. 93 189. 16 0. 82 cdk. MCS did not run to completion SF is more sensitive to the graph size P-Hom algorithms are more efficient and robust 26

Experimental Study (cont. ) Accuracy above 65% Insensitive P-Hom algorithms find matches with relatively high accuracy 27

Experimental Study (cont. ) P-Hom algorithms scale well with the size of graphs 28

Experimental Study (cont. ) Above 50% Accuracy of P-Hom algorithms is sensitive to the noise 29

Experimental Study (cont. ) our algorithms are not sensitive to the noise Graph simulation is sensitive to the noise Efficiency of P-Hom algorithms are not sensitive to the noise 30

Conclusion ü P-homomorphism and 1 -1 P-homomorphism, revisions of graph homomorphism/subgraph isomorphism • node similarity, edge-to-path mappings • quantitative metrics to measure graph similarity ü Complexity bounds of decision and optimization problems for P- hom and 1 -1 P-hom • Intractability • Approximation hardness ü Approximation algorithms with performance guarantees on match quality Graph homomorphism revisited for graph matching 31

Future work ü New application areas ü Indexing and filtering techniques ü Comparison of our work with feature-based approaches ü Incremental graph matching problem There is much more to be done 32

Terrorist Collaboration Network (1970 - 2010) “Those who were trained to fly didn’t know the others. One group of people did not know the other group. ” (Bin Laden) 33

• Approximation factor preserving reduction (AFP-reduction) from maximum weighted independent set problem (MWIS) Problem A x f Problem B f(x) g, α g(x, SB f(x), ε), α(ε) Solution of X SB f(x), ε Solution of f(X) 34