Answering pattern queries using views Wenfei Fan University

  • Slides: 29
Download presentation
Answering pattern queries using views Wenfei Fan University of Edinburgh Xin Wang Southwest Jiaotong

Answering pattern queries using views Wenfei Fan University of Edinburgh Xin Wang Southwest Jiaotong University Yinghui Wu UC Santa Barbara

Real-life graph querying is expensive 100 M(108) social scale 100 B (1011) Web scale

Real-life graph querying is expensive 100 M(108) social scale 100 B (1011) Web scale 1 T (1012) brain scale, 100 T (1014) Real-life scope An NSA Big Graph experiment, P. Burkhardt, et al, US. National Security Agency, May 2013 2

Querying collaborative network customer 3 expensive! customer n … project manager. PM 1 PM

Querying collaborative network customer 3 expensive! customer n … project manager. PM 1 PM 2 tester project manager developer 2 customer 2 developer 3 developer customer developer 1 query 1 customer 2 customer developer A collaborative pattern developer 3 customer 2 customer 1 … developer 3 Customer query 2 developer k developer customer 3 developer 2 A collaborative (chat) network “Detecting Coordination Problems in Collaborative Software Development Environments”, Amrit Chintan et al, Information System management, 2010 3

Answering query using views query A database views V(D) A(V) query result When? What

Answering query using views query A database views V(D) A(V) query result When? What to choose? How to evaluate? query Q database D relational algebra 1995 regular path queries 1998 2000 Q(D) query result tree pattern query XPath 2002 graph pattern query (bounded) simulation (our work) XML 2006 2007 RDF/SPARQL 2011 4

Outline Graph pattern matching using views ◦ When, what and how? When a query

Outline Graph pattern matching using views ◦ When, what and how? When a query can be evaluated using views? ◦ Pattern containment: an iff condition How to evaluate? ◦ query answering using views What to choose? ◦ minimum containment & minimal containment Extension: bounded simulation Experimental Study Conclusion 5

Graphs, patterns and views view 1 customer 2 edges customer developer (customer, developer) developer

Graphs, patterns and views view 1 customer 2 edges customer developer (customer, developer) developer pattern query (view definition) view definition 1 • binary relation • node match: matches satisfies predicates developer 3 {(customer 2, edge developer 2), match: (customer 3, developer connects two 3)} node • matches {(developer 2, customer 2), (developer, customer) customer (developer 2, customer 3), 3 (developer 3, customer 2)} developer 2 view extension 1 query result view 2 project manager customer developer view definition 2 edges matches (customer, developer) edges {(customer 2, developer 2), matches (customer 3, developer 3)} (project manager, {(PM 1, developer 2), {(developer 2, customer 2), developer) (PM 2, developer 3)} (developer, customer) (developer 2, customer 3), {(PM 1, 3, customer (developer customer 2), 2)} (project manager, customer) (PM 2, customer 2), (view extension) view extension 2 6

Graph pattern matching using views Given a pattern query Q, and a set V

Graph pattern matching using views Given a pattern query Q, and a set V of view definitions, find another query A s. t. ◦ A is equivalent to Q (A(G) = Q(G)) for all data graph G ◦ A only refers to V and extensions V(G) views V data graph G query A query Q A(G) Q(G) matches 7

When a pattern query can be answered using views? 8

When a pattern query can be answered using views? 8

Pattern containment project manager View 1 project manager customer developer (project manager, developer) (PM

Pattern containment project manager View 1 project manager customer developer (project manager, developer) (PM 1, developer 2) (project manager, customer) (PM 1, customer 2) (developer, customer) (developer 2, customer 2) (customer, developer) (customer 2, developer 2) Query result View 2 customer (project manager, developer) (project manager, customer) developer {(PM 1, developer 2), (PM 2, developer 3)} {(PM 1, customer 2), (PM 2, customer 2)} customer developer (customer, developer) {(customer 2, developer 2), (customer 3, developer 3)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} 9

Determining Pattern containment 10

Determining Pattern containment 10

Pattern containment: example project manager customer λ developer View 1 project manager customer developer

Pattern containment: example project manager customer λ developer View 1 project manager customer developer query as “data graph” project manager customer developer view matches developer customer View 2 11

How to answer pattern query using views? 12

How to answer pattern query using views? 12

Query evaluation using views Given Q, a set of views V and extensions, a

Query evaluation using views Given Q, a set of views V and extensions, a mapping λ, find the query result Q(G) Algorithm ◦ Collect edge matches for each query edge e and λ(e) ◦ Iteratively remove non-matches until no change happens ◦ Return Q(G) 13

Query evaluation using views “bottom-up” strategy query project manager customer (project manager, developer) {(PM

Query evaluation using views “bottom-up” strategy query project manager customer (project manager, developer) {(PM 1, developer 2), (PM 2, developer 3)} (project manager, customer) {(PM 1, customer 2), (PM 2, customer 2)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} (customer, developer) {(customer 2, developer 2), (customer 3, developer 3)} developer project manager Query result View 2 View 1 developer customer (project manager, developer) {(PM 1, developer 2), (PM 2, developer 3)} (customer, developer) {(customer 2, developer 2), (customer 3, developer 3)} (project manager, customer) {(PM 1, customer 2), (PM 2, customer 2)} (developer, customer) {(developer 2, customer 2), (developer 2, customer 3), (developer 3, customer 2)} customer developer 14

What should be selected? 15

What should be selected? 15

What to choose? choose all? project manager customer developer customer software tester view 1

What to choose? choose all? project manager customer developer customer software tester view 1 software view 2 query project manager developer customer developer project manager tester customer software view 3 view 4 developer customer developer software view 5 view 6 16

Minimum containment 17

Minimum containment 17

An log|Ep|-approximation 18

An log|Ep|-approximation 18

Minimum containment Ec project manager customer developer tester developer customer software view 1 view

Minimum containment Ec project manager customer developer tester developer customer software view 1 view 2 query project manager developer customer developer software project manager tester customer software view 3 view 4 developer customer developer software view 5 view 6 19

Minimal containment 20

Minimal containment 20

Minimal containment project manager customer developer customer software tester software view 1 view 2

Minimal containment project manager customer developer customer software tester software view 1 view 2 query project manager developer customer developer project manager tester customer software view 3 view 4 developer customer developer software view 5 view 6 21

Bounded pattern matching using views Bounded pattern queries project A collaborative (chat) network manager

Bounded pattern matching using views Bounded pattern queries project A collaborative (chat) network manager PM 2 3 2 project manager tester 2 customer 1 customer developer customer 2 View 1 customer developer 2 A collaborative pattern customer developer 1 developer View 2 developer 2 Answering bounded pattern queries ◦ Idea: “reduce” bounded pattern queries to weighted pattern queries ◦ View matches: weighted edge to weighted paths ◦ Complexity and algorithms carry over to bounded queries 22

Putting everything together Problem Simulation Classes language containment Bounded simulation Complexity Algorithm containment PTIME

Putting everything together Problem Simulation Classes language containment Bounded simulation Complexity Algorithm containment PTIME O(card(V)|Q|2+|V|2+|Q||V|) minimum Relational containment NP-c/APX-hard log|Ep|-approximable 2+|Q||V|+|Q|card(V)3/2) graph/RDF O(card(V)|Q|2+|V| XML Conjunctive minimal Relational. PTIME Xpath query algebra (XQuery) containment evaluation NP-c PTIME undecidable containment minimum containment co. NP-c PTIME undecida ble 2+|V|2+|Q||V|) RPQs ECRPQs (P)SPARQL O(card(V)|Q| O(|Q||V(G)| + |V(G)|2) undecidable undecida O(|Q|2|V|) ble EXPTIME NP-c/APX-hard log|Ep|-approximable O(|Q|2|V|+|Q|card(V)3/2) minimal containment PTIME O(|Q|2|V|) evaluation PTIME O(|Q||V(G)| + |V(G)|2) (bounded) pattern query PTIME 23

Experimental study 24

Experimental study 24

Efficiency: pattern queries “Music”; < 7 days “Sports” Rate > 4 Comedy; View >

Efficiency: pattern queries “Music”; < 7 days “Sports” Rate > 4 Comedy; View > 10 k greater improvement over denser graphs Youtube Views 2. 2 times and 1. 75 times faster |E| = |V| a 25

Efficiency: bounded pattern queries “Books”; rating > 4 “Music CD”; sales rank> 5000 “DVD”;

Efficiency: bounded pattern queries “Books”; rating > 4 “Music CD”; sales rank> 5000 “DVD”; reviews> 1000 Amazon Views greater improvement over larger graphs 10 times and 7. 1 times faster 26

Minimum vs. Minimal Minimum takes slightly more time to find substantially smaller sets of

Minimum vs. Minimal Minimum takes slightly more time to find substantially smaller sets of views 27

conclusion Pattern containment is tractable for (bounded) pattern queries Query evaluation using views is

conclusion Pattern containment is tractable for (bounded) pattern queries Query evaluation using views is much more efficient for large graphs than “batch” counterparts Journey just starts… ◦ More features to select good views to cache? ◦ When a query is not contained in existing views? ◦ View-based subgraph queries? 28

Answering pattern query using views Thank you! 29

Answering pattern query using views Thank you! 29