Adding Regular Expressions to Graph Reachability and Pattern
Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Nan Tang Shuai Ma Yinghui Wu University of Edinburgh Jianzhong Li Harbin Institute of Technology Yinghui Wu, ICDE 2011 1
Outline ü Real-life graphs bear multiple edge types • traditional models and methods may not be capable enough ü Reachability Queries and Graph Pattern Queries • nodes carrying predicates • edges carrying regular expressions ü Fundamental problems • query containment and equivalence • query minimization ü Query evaluation • Join-based and Split-based algorithms ü Conclusion A first step towards revising simulation for graph pattern matching Yinghui Wu, ICDE 2011 3
Graph Pattern Matching: the problem ü Given a pattern graph (a query) P and a data graph G , decide whether G matches P , and if so, find all the matches of P in G. ü Applications How to define? • social queries, social matching • biology and chemistry network querying • key work search, proximity search, … Widely employed in a variety of emerging real life applications Yinghui Wu, ICDE 2011 4
Subgraph isomorphism and Graph Simulation ü Node label equivalence ü Edge-to-edge function/relation A E A B B D P E D v 1 B v 2 E G D E D B B v 1 B v 2 A A P G Capable enough? Identical label matching, edge-to-edge function/relations Yinghui Wu, ICDE 2011 5
Considering edge types… 6 strangers-nemeses Biologist strangers-allies friends-nemeses Doctors Businessman Alice the journalist Essembly: a social voting network Real life graphs Yinghui have. Wu, multiple ICDE 2011 edge types
Querying Essembly network: an example strangers-nemeses fa+ strangers-allies Biologists supporting cloning fa<=2 sa<=2 friends-allies friends-nemeses fa<=2 sn … fn Alice 7 Doctors against cloning fn Pattern Essembly Network Pattern queries with multiple edge types Yinghui Wu, ICDE 2011
Graph reachability and pattern queries ü Real life graphs usually bear different edge types… ü data graph G = (V, E, f. A , f. C) • Reachability query (RQ) : (u 1, u 2, fu 1, fu 2, fe) where fe is a subclass of regular expression of: § F : : = c | c≤k | c+ | FF Job=‘biologist’, sp=‘cloning’ ü Qr(G): set of node pairs (v 1, v 2) that there is a nonempty path from v 1 to v 2 , and the edge colors on the fa path match the <=2 fn pattern specified by fe. Job=‘doctors’ Yinghui Wu, ICDE 2011 8
Graph pattern queries graph pattern queries PQ Qp =(Vp, Ep, fv , fe) where for each edge e=(u, u’), Qe=(u 1, u 2, fv(u) , fv(u’), fe(e)) is an RQ. Qp(G) is the maximum set (e, Se) (unique!) for any e 1(u 1, u 2) and e 2(u 2 , u 3), if (v 1, v 2) is in Se 1, then there is a v 3 that (v 2, v 3) is in Se 2. for any two edges e 1(u 1, u 2) and e 2(u 1 , u 3), if (v 1, v 2) is in Se 1, then there is a v 3 that (v 1, v 3) is in Se 2 fa+ RQ and simulation special cases of PQ Job=‘biologist’, fa<=2 are sa<=2 sp=‘cloning’ fa<=2 sn PQ vs. simulation search condition on query nodes mapping edges to paths Job=‘doctors’ constrain the edges on the path Id=‘Alice’ with a regular expression dsp=‘cloning’ fn Yinghui Wu, ICDE 2011 fn 9
Reachability and graph pattern query: examples fa+ Job=‘biologist’, sp=‘cloning’ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn fa<=2 fn fn Id=‘Alice’ fn sn sa fa fn Job=‘doctors’ dsp=‘cloning’ Job=‘doctors’ fa fa sa fa fa fnfn fn fn sn fa fa fnfn fa sn fa fa fn Yinghui Wu, ICDE 2011 10
Fundamental problems: query containment PQ Q 1 (V 1, E 1, fv 1 , fe 1) is contained in Q 2 (V 2, E 2, fv 2 , fe 2) if there exists a mapping λ from E 1 to E 2 s. t for any data graph G and e in E 1, Se is a subset of Sλ(e) , i. e. , λ is a renaming function that Q 1(G) is mapped to Q 2(G). Query containment and equivalence problems can all be determined in cubic time • Query similarity based on a revision of graph simulation • Determine the query similarity in cubic time Query containment and equivalence for PQs can be solved efficiently Yinghui Wu, ICDE 2011 11
Query containment: example B 1 h<=3 h<=1 B 3 B 2 h<=1 h<=3 h<=1 h<=2 C 1 C 2 C 3 Q 1 C 4 Q 2 C 5 C 6 Q 3 Q 2 is contained in Q 1 and Q 3 are equivalent Yinghui Wu, ICDE 2011 12
Fundamental problems: query minimization ü size of a query: |Vp| + |Ep| ü Query minimization problem • input: a PQ Qp • output: a minimized PQ Qm equivalent to Qp ü Query minimization problem can be solved in cubic time in the size of the query: • compute the maximum node equivalent classes based on a revision of graph simulation; • determine the number of redundant nodes and edges based on the equivalent classes; • remove redundant and isolated nodes and edges Query minimization for PQs be solved efficiently Yinghui Wu, can ICDE 2011 13
query minimization: example R R g f h<=2 C g<=3 C Q 1 g<=3 C C Q 2 Yinghui Wu, ICDE 2011 B B h<=2 g f B B h<=2 C g f B B R g<=3 h<=2 C C Q 3 14
Evaluating graph pattern queries ü PQ can be answered in cubic time. • Join-based Algorithm Join. Match § Matrix index vs distance cache § join operation for each edge in PQ until a fixpoint is reached (wrt. a reversed topological order) • Split-based Algorithm Split. Match § blocks: treating pattern node and data node uniformly § partition-relation pair Graph pattern matching can be solved in polynomial time Yinghui Wu, ICDE 2011 15
Example of Join. Match fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn sn sa fa fn fn Id=‘Alice’ fn Job=‘doctors’ dsp=‘cloning’ Step 1: identify the candidates for each query node Yinghui Wu, ICDE 2011 16
Example of Join. Match fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn sn sa fa fn fn Id=‘Alice’ fn Job=‘doctors’ dsp=‘cloning’ Step 2: filter the candidate sets for each query edge Yinghui Wu, ICDE 2011 17
Example of Join. Match fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn sn sa fa fn fn Id=‘Alice’ fn Job=‘doctors’ dsp=‘cloning’ Step 2: filter the candidate sets for each query edge Yinghui Wu, ICDE 2011 18
Example of Join. Match fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn sn sa fa fn fn Id=‘Alice’ fn Job=‘doctors’ dsp=‘cloning’ Step 2: filter the candidate sets for each query edge Yinghui Wu, ICDE 2011 19
Example of Join. Match fa+ fa<=2 sa<=2 Job=‘biologist’, sp=‘cloning’ fa<=2 sn sn sa fa fn fn Id=‘Alice’ fn Job=‘doctors’ dsp=‘cloning’ Step 3: return the final result Yinghui Wu, ICDE 2011 20
Experimental results – effectiveness of PQs Effectiveness of PQs: edge to path relations Yinghui Wu, ICDE 2011 21
Experimental results – querying real life graphs Varying |Vp| Varying |Ep| Size of query in average (8, 15, 3, 4, 5) for (|V|, |E|, |pred|, |c|, |b|) Evaluation algorithms are sensitive to pattern edges Yinghui Wu, ICDE 2011 22
Experimental results – querying real life graphs Varying |pred| Varying b The algorithms are sensitive to the number of predicates Yinghui Wu, ICDE 2011 23
Experimental results – querying synthetic graphs Varying b Varying |V| (x 105) The algorithms scale well over large synthetic graphs Yinghui Wu, ICDE 2011 24
Experimental results – querying synthetic graphs Varying α E=Vα Varying cr |sim(u)|<=V*cr The algorithms scale well over large synthetic graphs Yinghui Wu, ICDE 2011 25
Conclusion ü Simulation revised for graph pattern matching • Reachability Queries and Graph Pattern Queries § query containment and minimization – cubic time § query evaluation – cubic time ü Future work • extending RQs and PQs by supporting general regular expressions • incremental evaluation of RQs and PQs Simulation revised for graph pattern matching Yinghui Wu, ICDE 2011 26
Thank you! Q&A Terrorist Collaboration Network (1970 - 2010) “Those who were trained to fly didn’t know the others. One group of people did not know the other group. ” (Bin Laden) Yinghui Wu, ICDE 2011 27
- Slides: 26