PODS 2012 ACM SIGMODPODS Conference Scottsdale Arizona USA

  • Slides: 25
Download presentation
PODS 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA A Dichotomy in the Complexity of

PODS 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies Benny Kimelfeld IBM Research – Almaden

Deletion Propagation • Translate a tuple deletion on the view back to the source

Deletion Propagation • Translate a tuple deletion on the view back to the source relations … properly • Classic database problem – Specializing the more general view-update problem – [Dayal & Bernstein 1982; Cosmadakis & Papadimitriou 1984; Keller 1986; Cui & Widom 2001; Buneman & Khanna & Tan 2002; Cong & Fan & Geerts 2006; …] • Renewed motivation: debug/causality for false positives [K, Vondrak, Williams, 2011] • Various definitions of “properly” were studied – Minimize the view side effect This Work! • # view tuples lost except the intentional one – Minimize the source side effect • # source tuples to delete • = maximal “responsibility” for an answer [Meliou et al. , 2010]

Example: File Access [Cui & Widom 2001; Buneman et al. 2002] Access User. Group

Example: File Access [Cui & Widom 2001; Buneman et al. 2002] Access User. Group user file Emma a. txt Emma b. txt Olivia a. txt Olivia b. txt Jacob Group. File user group file Emma ai ai a. txt Emma db ai b. txt Olivia os db a. txt Olivia db db b. txt Jacob ai os a. txt = Access(u, f) : – User. Group(u, g), ⋈ Group. File(g, f) Delete source rows, s. t. Emma won’t access a. txt. But, maintain maximum access permissions!

Example: File Access [Cui & Widom 2001; Buneman et al. 2002] Access User. Group

Example: File Access [Cui & Widom 2001; Buneman et al. 2002] Access User. Group user file Emma a. txt Emma b. txt Olivia a. txt Olivia b. txt Jacob Group. File user group file Emma ai ai a. txt Emma db ai b. txt Olivia os db a. txt Olivia db db b. txt Jacob ai os a. txt = Access(u, f) : – User. Group(u, g), ⋈ Group. File(g, f) Delete source rows, s. t. Emma won’t access a. txt. But, maintain maximum access permissions!

Example: File Access [Cui & Widom 2001; Buneman et al. 2002] Access User. Group.

Example: File Access [Cui & Widom 2001; Buneman et al. 2002] Access User. Group. File user file Emma a. txt Emma b. txt Olivia a. txt Olivia b. txt Jacob a. txt side-oseffe ⋈ db c t fredbe ( & Olivia min db Jacob b. txt Jacob = user group file Emma ai ai a. txt Emma db ai b. txt Olivia imal side effect) os ai Access(u, f) : – User. Group(u, g), a. txt b. txt a. txt Group. File(g, f) Delete source rows, s. t. Emma won’t access a. txt. But, maintain maximum access permissions!

Formal Definitions Schema S: rel. symbols + functional dependencies (fd) R 1, …. ,

Formal Definitions Schema S: rel. symbols + functional dependencies (fd) R 1, …. , Rm Ri: attribute-set → attribute Conjunctive Query (CQ) Q: Q( y 1 , y 2 , y 3 ) : – R 1(x 1 , y 1), R 2(x 1 , 'ibm'), R 3(x 2 , y 1 , y 2 , x 3), R 4(x 4 , y 3) head variables atom Input: • DB D over S • Answer a ∈ Q(D) to delete existential variables No self joins! Solution: E ⊆ D s. t. a ∉ Q(E) • Side-effect free: Q(E) = Q(D) – {a} • Optimal: |Q(E)| is maximal

Complexity Questions What is the complexity of • Deciding if a side-effect-free solution exists?

Complexity Questions What is the complexity of • Deciding if a side-effect-free solution exists? • Finding an optimal solution? – Or one w/ approximately minimal side effect? – Or one w/ approximately maximal # surviving answers? • Not the same [K, Vondrák, Williams, 2011] Data complexity: Fixed: Schema S, CQ Q Input: DB D over S, answer a ∊ Q(D) to delete

Unirelation Algorithm (1 Rel): Example [Buneman et al. , 2002] Access User. Group user

Unirelation Algorithm (1 Rel): Example [Buneman et al. , 2002] Access User. Group user file Emma a. txt Emma b. txt Olivia a. txt Olivia Group. File user group file Emma ai ai a. txt Emma db ai b. txt Olivia os db a. txt Jacob a. txt Olivia db db b. txt Jacob ai os a. txt = Access(u, f) : – User. Group(u, g), ⋈ Group. File(g, f) Delete a = (Emma, a. txt)

Unirelation Algorithm (1 Rel): Example [Buneman et al. , 2002] Access User. Group user

Unirelation Algorithm (1 Rel): Example [Buneman et al. , 2002] Access User. Group user file Emma a. txt Emma b. txt Olivia a. txt Olivia Group. File user group file Emma ai ai a. txt Emma db ai b. txt Olivia os db a. txt Jacob a. txt Olivia db db b. txt Jacob ai os a. txt = Access(u, f) : – User. Group(u, g), ⋈ Group. File(g, f) better than previous ⇒ selected solution Delete a = (Emma, a. txt) Recall: there is even better solution (side-effect free)

1 Rel: General Case undesired a ∈ Q(D) Q has k atoms R 1

1 Rel: General Case undesired a ∈ Q(D) Q has k atoms R 1 R 2 … D solution 1 R 2 … select best solution 2 Rk D … (i=1, …, k) solution i: delete from Ri each tuple consistent w/ a Rk R 1 R 2 … D solution k Rk

Head Domination [K, Vondrák, Williams, 2011] Q: G∃[Q]: A CQ over a schema S

Head Domination [K, Vondrák, Williams, 2011] Q: G∃[Q]: A CQ over a schema S nodes = atoms(Q) edges = “sharing ≥ 1 existential var. ” head domination: ∀ C ∊ CC(G∃[Q]) ∃j ∊ atoms(Q) s. t. , head. Vars(C) ⊆ vars(j) Connected Components Q( y 1 , y 2) : – R 1(x 1 , y 1) , R 2(x 1 , y 2) , R 3(x 1 , y 2) Q( y 1 , y 2 , y 3) : – R 1(x 1 , y 1) , R 2(x 1 , y 2) , R 3(y 1 , y 2) , R 4(x 2 , y 3) Q( y 1 , y 2) : – R 1(x , y 1) , R 2(x , y 2) Access(u, f)

Previous Dichotomy Theorem [KVW 2011] Let Q be a CQ over a schema S

Previous Dichotomy Theorem [KVW 2011] Let Q be a CQ over a schema S (no self joins) [K, Vondrak, Williams, 2011], no FDs: Q has head ⇒ 1 Rel returns an optimal solution (in PTime) domination otherwise ⇒ PTime (1 Rel) ∃side-effect-free is NP-complete; NP-hard to find an (αQ-approx. ) optimal solution Q( y 1 , y 2) : – R 1(x 1 , y 1) , R 2(x 1 , y 2) , R 3(x 1 , y 2) Q( y 1 , y 2 , y 3) : – R 1(x 1 , y 1) , R 2(x 1 , y 2) , R 3(y 1 , y 2) , R 4(x 2 , y 3) PTime (1 Rel) Q( y 1 , y 2) : – R 1(x , y 1) , R 2(x , y 2) NP-hard Access(u, f)

Access Example Revisited Delete (Emma, a. txt) NP-hard Access user file Emma a. txt

Access Example Revisited Delete (Emma, a. txt) NP-hard Access user file Emma a. txt Emma b. txt Olivia a. txt Olivia b. txt Jacob a. txt Jacob b. txt User. Group = Group. File user group file Emma ai ai a. txt Emma db ai b. txt Olivia os db a. txt Olivia db db b. txt Jacob ai os a. txt ⋈ group ← file PTime

Access Example Revisited Delete (Emma, a. txt) NP-hard Access user file Emma a. txt

Access Example Revisited Delete (Emma, a. txt) NP-hard Access user file Emma a. txt Emma b. txt Olivia a. txt Olivia b. txt Jacob a. txt Jacob b. txt User. Group = Group. File user group file Emma ai ai a. txt Emma db ai b. txt Olivia os db a. txt Olivia db db b. txt Jacob ai os a. txt ⋈ user → group ← file PTime

Access Example Revisited Delete (Emma, a. txt) NP-hard Access user file Emma a. txt

Access Example Revisited Delete (Emma, a. txt) NP-hard Access user file Emma a. txt Emma b. txt Olivia a. txt Olivia b. txt Jacob a. txt Jacob b. txt User. Group = Group. File user group file Emma ai ai a. txt Emma db ai b. txt Olivia os db a. txt Olivia db db b. txt Jacob ai os a. txt ⋈ user ← group PTime user → group ← file PTime

Access Example Revisited Delete (Emma, a. txt) NP-hard Access user file Emma a. txt

Access Example Revisited Delete (Emma, a. txt) NP-hard Access user file Emma a. txt Emma b. txt Olivia a. txt Olivia b. txt Jacob a. txt Jacob b. txt User. Group = Every nontrivial set of FDs brings the problem to PTime Group. File user group file Emma ai ai a. txt Emma db ai b. txt Olivia os db a. txt Olivia db db b. txt Jacob ai os a. txt ⋈ user ← group → file PTime user → group ← file PTime

Additional Examples Q(y , y 1 , y 2) : – R 1(y 1

Additional Examples Q(y , y 1 , y 2) : – R 1(y 1 , x 1) , R(x 1 , y , x 2) , R 2(y 2 , x 2) NP-hard Q(y , y 1 , y 2) : – R 1(x 1 , y 1) , R(x 1 , y , x 2) , R 2(x 2 , y 2) PTime Q( y , y 1 , y 2) : – R 1(x 1 , y 1) , R(x 1 , y , x 2) , R 2(x 2 , y 2) NP-hard

Dichotomy with FDs Let Q be a CQ over a schema S (no self

Dichotomy with FDs Let Q be a CQ over a schema S (no self joins) [K, Vondrak, Williams, 2011], no FDs: 1 Rel returns an Q has head ⇒ optimal solution domination (in PTime) ∃side-effect-free is NP-complete; otherwise ⇒ NP-hard to find an (αQ-approx. ) optimal solution This paper: (FDs) Remove tuple only if it is used for the undersired answer Q+ has 1 Rel* returns an functional ⇒ optimal solution head dom. (in PTime) ∃side-effect-free is NP-complete; otherwise ⇒ NP-hard to find an (αQ-approx. ) optimal solution Depending on the CQ and FDs, the problem is either straightforward or hard!

FDs Among Variables Definition: CQ Q over schema S, U, V ⊆ variables(Q) ∀

FDs Among Variables Definition: CQ Q over schema S, U, V ⊆ variables(Q) ∀ D ∈ db(S) m 1, m 2 ∈ hom(Q→D) U → V: m 1=m 2 on U ⇒ m 1=m 2 on V Access(u, f) : – User. Group(u, g), Group. File(g, f) FD: user → group FD: group → file u→g g→f u→f {u, g} → f

The CQ Q+ Tractability Condition: Q+ has functional head domination Definition: CQ Q over

The CQ Q+ Tractability Condition: Q+ has functional head domination Definition: CQ Q over schema S, U, V ⊆ variables(Q) ∀ D ∈ db(S) m 1, m 2 ∈ hom(Q→D) U → V: m 1=m 2 on U ⇒ m 1=m 2 on V Q+ : add to Q’s head every x s. t. head. Vars → x Access(u, f) : – User. Group(u, g), Group. File(g, f) group ← file ⇒ g ← {u, f} Access+(u, g, f) : – User. Group(u, g), Group. File(g, f)

Functional Head Domination Q: G∃[Q]: Tractability Condition: A CQ over a schema S Q+

Functional Head Domination Q: G∃[Q]: Tractability Condition: A CQ over a schema S Q+ has functional head domination nodes = atoms(Q) edges = “sharing ≥ 1 existential var. ” head domination: ∀ C∈CC(G∃[Q]) ∃j ∊ atoms(Q), s. t. vars(j) ⊇ head. Vars(C) functional head domination: ∀ C∈CC(G∃[Q]) ∃j ∊ atoms(Q), s. t. vars(j) → head. Vars(C) Access(u, f) : – User. Group(u, g), Group. File(g, f) {u, g} → {u, f} ⇐ group → file

Examples Tractability Condition: Q+ has functional head domination Q( y , y 1 ,

Examples Tractability Condition: Q+ has functional head domination Q( y , y 1 , y 2) : – R 1(x 1 , y 1) , R(x 1 , y , x 2) , R 2(x 2 , y 2) NP-hard Q(y , y 1 , y 2) : – R 1(x 1 , y 1) , R(x 1 , y , x 2) , R 2(x 2 , y 2) {y , y 1 , y 2} → x 2 Q+(y , y 1 , y 2, x 2) : – R 1(x 1 , y 1) , R(x 1 , y , x 2) , R 2(x 2 , y 2) PTime (1 Rel*)

Example: Key-Preserving Views Tractability Condition: Q+ has functional head domination Theorem [Cong, Fan, Geerts,

Example: Key-Preserving Views Tractability Condition: Q+ has functional head domination Theorem [Cong, Fan, Geerts, 2006]: Q preserves keys* ⇒ deletion propagation in PTime For CQs w/o self joins, follows directly from our positive side: Q preserves keys ⇒ Q+ has no existential vars ⇒ G∃[Q+] has no edges ⇒ Q+ trivially has functional head domination (every connected component is a node, dominated by itself…) ⇒ 1 Rel* returns an optimal solution * Each relation has a key; none of the key attributes are projected out

About the Proof • The positive side is fairly simple – … once the

About the Proof • The positive side is fairly simple – … once the tractability condition is found • The negative side is intricate – Reduction from the special case of the Access CQ – Challenge: simulating Access(u, f) by an instance that satisfies all the FDs – Central concept: graph separation on the variable graph of the CQ Q(y 1 , y 2) : – R 1(y 1 , x) , R 2(x , y 2) → Q'(y 1 , y 2) : – R 1(y 1 , x) , R 2(x , x 2 , y 2) R 3(x 1 , x 2)

Conclusions & Ongoing Work • Studied deletion propagation in the presence of functional dependencies

Conclusions & Ongoing Work • Studied deletion propagation in the presence of functional dependencies • Established a dichotomy in complexity: – PTime by a straightforward algorithm vs. – Hardness (of approximation) • Generalizes previously established special cases: no FDs, key-preserving views • Ongoing work: deletion of multiple answers – Preview: trichotomy Questions? • Straightforward • Hard but approximable (by a constant-factor) • Hard to approximate