Minimizing View Sets without Losing QueryAnswering Power Chen

  • Slides: 28
Download presentation
Minimizing View Sets without Losing Query-Answering Power Chen Li Stanford University joint work with

Minimizing View Sets without Losing Query-Answering Power Chen Li Stanford University joint work with Mayank Bawa and Jeff Ullman ICDT'2001, London, UK 1

A web-caching scenario user query Client answer source query cache Server 2

A web-caching scenario user query Client answer source query cache Server 2

Client Cached query results: Q 1(T, A, Pr) : - book(T, A, Pub, Pr)

Client Cached query results: Q 1(T, A, Pr) : - book(T, A, Pub, Pr) Q 2(T, A, Pr) : - book(T, A, prenhall, Pr) Q 3(A 1, A 2) : - book(T, A 1, prenhall, Pr 1), book(T, A 2, prenhall, Pr 2) Source relation: Book(Title, Author, Pub, Price) 3

What query results to remove? Book(Title, Author, Pub, Price) Cached query results: Q 1(T,

What query results to remove? Book(Title, Author, Pub, Price) Cached query results: Q 1(T, A, Pr) : - book(T, A, Pub, Pr) Q 2(T, A, Pr) : - book(T, A, prenhall, Pr) Q 3(A 1, A 2) : - book(T, A 1, prenhall, Pr 1), book(T, A 2, prenhall, Pr 2) • Q 2 Q 1 • Remove Q 2? Cannot answer query: Q(T, Pr) : - book(T, smith, prenhall, Pr) 4

How about removing Q 3? Book(Title, Author, Pub, Price) Cached query results: Q 1(T,

How about removing Q 3? Book(Title, Author, Pub, Price) Cached query results: Q 1(T, A, Pr) : - book(T, A, Pub, Pr) Q 2(T, A, Pr) : - book(T, A, prenhall, Pr) Q 3(A 1, A 2) : - book(T, A 1, prenhall, Pr 1), book(T, A 2, prenhall, Pr 2) Compute Q 3 using Q 2: Q 3(A 1, A 2) : - Q 2(T, A 1, Pr 1), Q 2(T, A 2, Pr 2) We are not losing any query-answering power! 5

Observations: – Traditional query-containment does not help [Chandra and Merlin, 1977]. – We should

Observations: – Traditional query-containment does not help [Chandra and Merlin, 1977]. – We should consider query-answering power. • General questions: – How to describe “query-answering power”? – How to minimize a view set without losing its query-answering power? 6

Rest of the talk • Answering queries using views • Query-answering power – p-containment

Rest of the talk • Answering queries using views • Query-answering power – p-containment – Relationship with traditional query containment – Minimizing a view set • p-containment relative to a set of queries • Conclusion and open problems 7

Answering queries using views • Conjunctive queries and views: h(X) : - g 1(X

Answering queries using views • Conjunctive queries and views: h(X) : - g 1(X 1), …, gn(Xn) • Example: V 1(T, A, Pr) : - book(T, A, Pub, Pr) V 2(T, A, Pr) : - book(T, A, prenhall, Pr) V 3(A 1, A 2) : - book(T, A 1, prenhall, Pr 1), book(T, A 2, prenhall, Pr 2) 8

Query answerability • A query Q is answerable by a view set V if

Query answerability • A query Q is answerable by a view set V if we can rewrite Q using views in V [LMSS 95]. • Example: V 2(T, A, Pr) : - book(T, A, prenhall, Pr) V 3(A 1, A 2) : - book(T, A 1, prenhall, Pr 1), book(T, A 2, prenhall, Pr 2) V 3 is answerable by V 2: V 3(A 1, A 2) : - V 2(T, A 1, Pr 1), V 2(T, A 2, Pr 2) 9

Algorithms • • • Bucket algorithm [LRO 96] Inverse-rule algorithm [DG 97, Qia 96]

Algorithms • • • Bucket algorithm [LRO 96] Inverse-rule algorithm [DG 97, Qia 96] Mini. Con algorithm [PL 00] SVB algorithm [Mit 99] Core. Cover Algorithm [ALU 00] Testing whether a query is answerable by a set of views is NP-complete. 10

Views are expensive to maintain • Require storage space. • Need to be kept

Views are expensive to maintain • Require storage space. • Need to be kept up-to-date. We want to minimize a given view set while keeping its query-answering power. 11

p-containment • A view set V is p-contained in another view set W if

p-containment • A view set V is p-contained in another view set W if W can answer all the queries that are answerable by V. – “p” stands for “power. ” – Denoted: V p W • Two view sets are equipotent, if V p W and W p V. – They have the same power to answer queries. 12

Example: V 1(T, A, Pr) : - book(T, A, Pub, Pr) V 2(T, A,

Example: V 1(T, A, Pr) : - book(T, A, Pub, Pr) V 2(T, A, Pr) : - book(T, A, prenhall, Pr) V 3(A 1, A 2) : - book(T, A 1, prenhall, Pr 1), book(T, A 2, prenhall, Pr 2) {v 1, v 2, v 3} p {v 1, v 2} p {v 1, v 2, v 3} Therefore: {v 1, v 2, v 3} and {v 1, v 2} are equipotent. 13

 • Lemma: V p W iff each view in V can be answered

• Lemma: V p W iff each view in V can be answered by W. – Implies an algorithm for testing p-containment. – Assuming view sets are finite. • Theorem: Testing V p W is NP-complete. 14

p-containment and query containment V 1(T, A, Pr) : - book(T, A, Pub, Pr)

p-containment and query containment V 1(T, A, Pr) : - book(T, A, Pub, Pr) V 2(T, A, Pr) : - book(T, A, prenhall, Pr) V 3(A 1, A 2) : - book(T, A 1, prenhall, Pr 1), book(T, A 2, prenhall, Pr 2) • Query containment does not imply p-containment {v 1} and {v 2} • p-containment does not imply query containment {v 2} and {v 3} 15

Minimizing a view set • Keep removing views from the view set while retaining

Minimizing a view set • Keep removing views from the view set while retaining the equipotence. • Might have multiple equipotent minimals V 1(A) : - r(A, B) V 2(B) : - r(A, B) V 3(A, B) : - r(A, X), r(Y, B) {V 1, V 2, V 3} has two equipotent minimals: {V 1, V 2}, {V 3} 16

p-containment relative to queries Queries: Q={Q 1, Q 2, …} V = {V 1,

p-containment relative to queries Queries: Q={Q 1, Q 2, …} V = {V 1, V 2, …, Vm} W = {W 1, W 2, …, Wn} V is p-contained in W w. r. t. Q if the queries in Q that are answerable by V are also answerable by W. 17

Example of relative p-containment Relations: car(Make, Dealer) loc(Dealer, City) Queries: Q 1(D, C) :

Example of relative p-containment Relations: car(Make, Dealer) loc(Dealer, City) Queries: Q 1(D, C) : - car(toyota, D), loc(D, C) Q 2(D, C) : - car(honda, D), loc(D, C) Views: V = {V 1, V 2}, V 1 = Q 1, V 2 = Q 2 W = {W 1} W 1(M, D, C) : - car(M, D), loc(D, C) 18

Testing relative p-containment • Q is finite: test by the definition. • Q is

Testing relative p-containment • Q is finite: test by the definition. • Q is infinite? 19

Parameterized queries • Motivation: web search forms. • A PQ is a conjunctive query

Parameterized queries • Motivation: web search forms. • A PQ is a conjunctive query with placeholders. • Example: – – q(D) : - car($M, D), loc(D, $C) Placeholders $M, $C, replaced by constants Instances: q(D) : - car(toyota, D), loc(D, sf) q(D) : - car(honda, D), loc(D, pa) The domain of each placeholder is infinite. Thus, represent infinite number of queries. 20

Q: q(D) : - car($M, D), loc(D, $C) • v 1(M, D, C) :

Q: q(D) : - car($M, D), loc(D, $C) • v 1(M, D, C) : - car(M, D), loc(D, C) – Answer all instances of Q. • v 2(M, D) : - car(M, D), loc(D, sf) – Answer some instances of Q. – Answerable instances of Q are instances of: q(D) : - car($M, D), loc(D, sf) • v 3(M) : - car(M, D), loc(D, sf) – Answer no instances of Q. 21

 • Assume queries are generated by one PQ; • Results easily extendable to

• Assume queries are generated by one PQ; • Results easily extendable to the case with finite set of PQs. • Complete answerability of a PQ using views – V can answer all instances of a PQ Q. – Example: q(D) : - car($M, D), loc(D, $C) v 1(M, D, C) : - car(M, D), loc(D, C) 22

An algorithm for testing complete answerability • Replace each placeholder with a new distinct

An algorithm for testing complete answerability • Replace each placeholder with a new distinct constant, get a canonical instance I; • Test if I is answerable by V. Example: PQ: q(D) : - car($M, D), loc(D, $C) View: v 1(M, D, C) : - car(M, D), loc(D, C) Canonical instance: q(D) : - car(m 0, D), loc(D, c 0) Rewriting: q(D) : - v 1(m 0, D, c 0) 23

Partial answerability • Some instances of Q are answerable by V q(D) : -

Partial answerability • Some instances of Q are answerable by V q(D) : - car($M, D), loc(D, $C) v 2(M, D) : - car(M, D), loc(D, sf) • Theorem: All the answerable instances of a PQ using V are instances of a finite set of PQs, s. t. each of them is completely answerable by V. q(D) : - car($M, D), loc(D, sf) 24

All instances of Q answerable instances PQ 1 a parameterized query Q PQ 2

All instances of Q answerable instances PQ 1 a parameterized query Q PQ 2 … PQk V={V 1, …, Vn} An algorithm for finding the finite set of PQs. 25

Testing p-containment w. r. t. PQ • Find the PQs whose instances are all

Testing p-containment w. r. t. PQ • Find the PQs whose instances are all the instances of Q that are answerable by V. • For each of the PQs, test if it is completely answerable by V. • Details are in the paper. 26

Conclusion • Introduced p-containment, which is different from query containment. • Showed how to

Conclusion • Introduced p-containment, which is different from query containment. • Showed how to minimize a view set without losing query-answering power. • Developed an algorithm for testing relative p-containment w. r. t. instances of PQs. • Extended to MCR-containment. 27

Open problems • Find a view subset with lowest “cost. ” • If views

Open problems • Find a view subset with lowest “cost. ” • If views are not given, find the best views to materialize. 28