LLNL Graph XRay Fast BestEffort Pattern Matching in
LLNL Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad 8/13/2007 KDD 2007, San Jose
Input Output Query Graph Matching Subgraph Attributed Data Graph 2
Terminology: ``Conform’’ Matching Subgraph conforms Query Graph 3
Terminology: ``Interception’’ matching node Intermediate node matching node Matching Subgraph Query Graph Path 12 -13 -4 is an Interception 4
Terminology: ``Instantiate’’ Matching Subgraph Ht Query Graph Hq Node 11 instantiates SEC node Ht instantiates Hq 5
Roadmap • Introduction – Problem Definition – Motivations • How to: Graph X-Ray • Experimental Results • Conclusion 6
Motivation: Why Not SQL? • Case 1: Exact match does not exist – Q: How to find approximate answer? • Case 2: Too many exact matches – Q: How to rank them? 7
Motivation: Why Not SQL? (Cont. ) • Case 3: Exact match might be not the best answer – ``Find CEO who has heavy contact with Accountant’’ • Q: how to find right? Exact match 1 direct connection Inexact match Many indirect connections 8
Motivation: Efficiency • Why Not Subgraph Isomorphism? – Polynomial for fixed # of pattern query • Q 1: How to scale up linearly? • Q 2: … and with a small slope? 9
Wish List • Effectiveness – Both exact match & inexact Match – Ranking among multiple results – ``Best’’ answer (proximity-based) • Efficiency – Scale linearly – Scale with small scope G-Ray meets all! 10
Roadmap • Introduction – Problem Definition – Motivations • How to: Graph X-Ray • Experimental Results • Conclusion 11
Preliminary: Center-Piece Subgraph [Tong+] Q Original Graph Ce. PS meta Black: queryis nodes opt. in G-Ray! 12
Preliminary: Augmented Graph • Data nodes – 1, … 13 • Attribute nodes –a Footnote Aug. Graph is crucial for computation! 13
G-Ray: quick overview (for loop ) Step 1: SF Step 2: NE Step 3: BR Step 6: NE Step 4: NE Step 5: BR SF: Seed-Finder NE: Neighborhood -Expander BR: Bridge 14 Step 7: BR Step 8: BR
Seed-Finder ( ) • Q: How to instantiate SEC node? • A: Footnote `11’ is close to some un-known data nodes for `CEO’ `Account. ’ a n d ` M a n a g e r ’ 15
Neighborhood-Expander ( ) • Q: How to instantiate CEO node? – Step 1 Step 2? • A: • Footnote: – Step 3 Step 4? – Step 5 Step 6? 16
Bridge ( ) Step 6: NE • Q: ? Step 7: BR • A: Prim-like Alg. – To maximize – Should block node 11 and 7 • Footnote – Connection subgraph, or one single path? 17
Roadmap • Introduction – Problem Definition – Motivation • How to: Graph X-Ray • Experimental Results • Conclusion 18
Experimental Results • Datasets – DBLP – Node: author (315 k) – Edge: co-authorship (1, 800 k) – Attribute: conference & year (13 k) • KDD-2001, SIGMOD… 19
Effectiveness: star-query Query Result 20
Effectiveness: line-query Query Result 21
Effectiveness: loop-query Query Result 22
Response Time Efficiency • Scale linearly • Small slope • 3 -5 Seconds # of Edges ~2 M edges 23
Roadmap • Introduction – Problem Definition – Motivation • How to: Graph X-Ray • Experimental Results • Conclusion 24
Conclusion • Graph X-Ray (G-Ray) – Best effort pattern match • in large attributed graphs – Scale linearly • with small slope • More details in Poster Session – Monday (tonight) – board number 8 25
G-Ray X-Ray Thank you! www. cs. cmu. edu/~htong 26
Backup-slides 27
Proximity on Graph a. k. a relevance, closeness 9 10 12 2 8 1 11 3 4 5 7 How to: ---- random walk with restart 6 • Multi-faceted • Punish long path • Edge weight 28
Random walk with restart 0. 04 9 0. 10 2 0. 13 1 3 8 0. 13 0. 03 10 12 0. 08 11 0. 04 4 0. 13 6 5 7 Node 4 0. 05 Nearby nodes, higher scores More red, more relevant 0. 02 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0. 13 0. 10 0. 13 0. 22 0. 13 0. 05 0. 08 0. 04 0. 03 0. 04 0. 02 Ranking vector 29
How to rank the results • Our goodness function – Measure the proximity between any two matching nodes if they are required to be connected. (two-way) – Multiply them together • In G-Ray, we approximately optimize this goodness functions • If we have multiple matching subgraphs, we can rank them according to this goodness functions 30
How to rank the results matching node Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12) 31
- Slides: 31