Answering pattern queries using views Wenfei Fan University





























- Slides: 29
Answering pattern queries using views Wenfei Fan University of Edinburgh Xin Wang Southwest Jiaotong University Yinghui Wu UC Santa Barbara
Real-life graph querying is expensive 100 M(108) social scale 100 B (1011) Web scale 1 T (1012) brain scale, 100 T (1014) Real-life scope An NSA Big Graph experiment, P. Burkhardt, et al, US. National Security Agency, May 2013 2
Querying collaborative network customer 3 expensive! customer n … project manager. PM 1 PM 2 tester project manager developer 2 customer 2 developer 3 developer customer developer 1 query 1 customer 2 customer developer A collaborative pattern developer 3 customer 2 customer 1 … developer 3 Customer query 2 developer k developer customer 3 developer 2 A collaborative (chat) network “Detecting Coordination Problems in Collaborative Software Development Environments”, Amrit Chintan et al, Information System management, 2010 3
Answering query using views query A database views V(D) A(V) query result When? What to choose? How to evaluate? query Q database D relational algebra 1995 regular path queries 1998 2000 Q(D) query result tree pattern query XPath 2002 graph pattern query (bounded) simulation (our work) XML 2006 2007 RDF/SPARQL 2011 4
Outline Graph pattern matching using views ◦ When, what and how? When a query can be evaluated using views? ◦ Pattern containment: an iff condition How to evaluate? ◦ query answering using views What to choose? ◦ minimum containment & minimal containment Extension: bounded simulation Experimental Study Conclusion 5
Graphs, patterns and views view 1 customer 2 edges customer developer (customer, developer) developer pattern query (view definition) view definition 1 • binary relation • node match: matches satisfies predicates developer 3 {(customer 2, edge developer 2), match: (customer 3, developer connects two 3)} node • matches {(developer 2, customer 2), (developer, customer) customer (developer 2, customer 3), 3 (developer 3, customer 2)} developer 2 view extension 1 query result view 2 project manager customer developer view definition 2 edges matches (customer, developer) edges {(customer 2, developer 2), matches (customer 3, developer 3)} (project manager, {(PM 1, developer 2), {(developer 2, customer 2), developer) (PM 2, developer 3)} (developer, customer) (developer 2, customer 3), {(PM 1, 3, customer (developer customer 2), 2)} (project manager, customer) (PM 2, customer 2), (view extension) view extension 2 6
Graph pattern matching using views Given a pattern query Q, and a set V of view definitions, find another query A s. t. ◦ A is equivalent to Q (A(G) = Q(G)) for all data graph G ◦ A only refers to V and extensions V(G) views V data graph G query A query Q A(G) Q(G) matches 7
When a pattern query can be answered using views? 8
Pattern containment project manager View 1 project manager customer developer (project manager, developer) (PM 1, developer 2) (project manager, customer) (PM 1, customer 2) (developer, customer) (developer 2, customer 2) (customer, developer) (customer 2, developer 2) Query result View 2 customer (project manager, developer) (project manager, customer) developer {(PM 1, developer 2), (PM 2, developer 3)} {(PM 1, customer 2), (PM 2, customer 2)} customer developer (customer, developer) {(customer 2, developer 2), (customer 3, developer 3)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} 9
Determining Pattern containment 10
Pattern containment: example project manager customer λ developer View 1 project manager customer developer query as “data graph” project manager customer developer view matches developer customer View 2 11
How to answer pattern query using views? 12
Query evaluation using views Given Q, a set of views V and extensions, a mapping λ, find the query result Q(G) Algorithm ◦ Collect edge matches for each query edge e and λ(e) ◦ Iteratively remove non-matches until no change happens ◦ Return Q(G) 13
Query evaluation using views “bottom-up” strategy query project manager customer (project manager, developer) {(PM 1, developer 2), (PM 2, developer 3)} (project manager, customer) {(PM 1, customer 2), (PM 2, customer 2)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} (customer, developer) {(customer 2, developer 2), (customer 3, developer 3)} developer project manager Query result View 2 View 1 developer customer (project manager, developer) {(PM 1, developer 2), (PM 2, developer 3)} (customer, developer) {(customer 2, developer 2), (customer 3, developer 3)} (project manager, customer) {(PM 1, customer 2), (PM 2, customer 2)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} customer developer 14
What should be selected? 15
What to choose? choose all? project manager customer developer customer software tester view 1 software view 2 query project manager developer customer developer project manager tester customer software view 3 view 4 developer customer developer software view 5 view 6 16
Minimum containment 17
An log|Ep|-approximation 18
Minimum containment Ec project manager customer developer tester developer customer software view 1 view 2 query project manager developer customer developer software project manager tester customer software view 3 view 4 developer customer developer software view 5 view 6 19
Minimal containment 20
Minimal containment project manager customer developer customer software tester software view 1 view 2 query project manager developer customer developer project manager tester customer software view 3 view 4 developer customer developer software view 5 view 6 21
Bounded pattern matching using views Bounded pattern queries project A collaborative (chat) network manager PM 2 3 2 project manager tester 2 customer 1 customer developer customer 2 View 1 customer developer 2 A collaborative pattern customer developer 1 developer View 2 developer 2 Answering bounded pattern queries ◦ Idea: “reduce” bounded pattern queries to weighted pattern queries ◦ View matches: weighted edge to weighted paths ◦ Complexity and algorithms carry over to bounded queries 22
Putting everything together Problem Simulation Classes language containment Bounded simulation Complexity Algorithm containment PTIME O(card(V)|Q|2+|V|2+|Q||V|) minimum Relational containment NP-c/APX-hard log|Ep|-approximable 2+|Q||V|+|Q|card(V)3/2) graph/RDF O(card(V)|Q|2+|V| XML Conjunctive minimal Relational. PTIME Xpath query algebra (XQuery) containment evaluation NP-c PTIME undecidable containment minimum containment co. NP-c PTIME undecida ble 2+|V|2+|Q||V|) RPQs ECRPQs (P)SPARQL O(card(V)|Q| O(|Q||V(G)| + |V(G)|2) undecidable undecida O(|Q|2|V|) ble EXPTIME NP-c/APX-hard log|Ep|-approximable O(|Q|2|V|+|Q|card(V)3/2) minimal containment PTIME O(|Q|2|V|) evaluation PTIME O(|Q||V(G)| + |V(G)|2) (bounded) pattern query PTIME 23
Experimental study 24
Efficiency: pattern queries “Music”; < 7 days “Sports” Rate > 4 Comedy; View > 10 k greater improvement over denser graphs Youtube Views 2. 2 times and 1. 75 times faster |E| = |V| a 25
Efficiency: bounded pattern queries “Books”; rating > 4 “Music CD”; sales rank> 5000 “DVD”; reviews> 1000 Amazon Views greater improvement over larger graphs 10 times and 7. 1 times faster 26
Minimum vs. Minimal Minimum takes slightly more time to find substantially smaller sets of views 27
conclusion Pattern containment is tractable for (bounded) pattern queries Query evaluation using views is much more efficient for large graphs than “batch” counterparts Journey just starts… ◦ More features to select good views to cache? ◦ When a query is not contained in existing views? ◦ View-based subgraph queries? 28
Answering pattern query using views Thank you! 29