Neighborhood Interchangeability NI for NonBinary CSPs Application to
Neighborhood Interchangeability (NI) for Non-Binary CSPs & Application to Databases Anagh Lal Constraint Systems Laboratory Computer Science & Engineering University of Nebraska-Lincoln Research supported by NSF CAREER award #0133568 and by Maude Hammond Fling Faculty Research Fellowship. Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 1
Main contributions CSPs 1. Interchangeability: An algorithm for neighborhood interchangeability (NI) in non-binary CSPs 2. Dynamic bundling: Integrating NI + backtrack search for solving non-binary CSPs 3. Exploratory: Towards detecting substitutability Databases 1. A new model of the join query as a CSP 2. A new sorting-based bundling algorithm 3. A new sort-merge join algorithm that produces bundled tuples 4. Exploratory: Application to materialized views Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 2
Outline • Background • Neighborhood Interchangeability (NI) for non-binary CSPs • Empirical evaluations • Database algorithms based on dynamic bundling • Conclusions & future work Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 3
Constraint Satisfaction Problem • Given P V 1 = (V, D, C) {d} V 2 {c, d, e, f} V 4 – V : set of variables V 3 {a, b, d} {a, b, c} – D : set of their domains – C : set of constraints restricting the acceptable combination of values for variables – Solution is a consistent assignment of values to variables • Query: find 1 solution, all solutions, etc. • Examples: SAT, scheduling, product configuration • NP-Complete in general Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 4
Systematic search • Basic mechanism – DFS & backtracking (BT) – Variable being instantiated: current variable – Uninstantiated variables: future variables – Instantiated variables: past variables • Constraint propagation – Remove values inconsistent with constraints – Forward checking filters domains of future variables given the instantiation of current variable Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 5
Value interchangeability [Freuder, ‘ 91] Equivalent values in the domain of a variable V 1 V 3 {d} V 2 {c, d, e, f } V 4 {a, b, d} {a, b, c} • Full Interchangeability (FI): – d, e, f interchangeable for V 2 in any solution • Neighborhood Interchangeability (NI): – Efficiently approximates FI – Finds e, f but misses d – Discrimination tree DT(Vx) Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 6
Bundling: using NI in search V 1 V 3 V 2 {d} { c, d, e, f } V 4 {a, b, d} S S S V 1 d V 2 {a, b, c} c e f BT • Static bundling • Dynamic bundling d c e, f d Static bundling V 2 c d, e, f Dynamic bundling [Haselböck, ‘ 93] [Our group, ‘ 01] – Dynamically identifies NI – Finds fatter solution than BT & static bundling – Never less efficient than BT & static bundling Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 7
Robust solutions Single solution • V 1 d • V 2 e • V 3 a • V 4 c V 1 V 3 {d} {a, b, d} V 2 {c, d, e, f} V 4 {a, b, c} Robust solution • V 1 {d} • V 2 {d, e, f} • V 3 {a} • V 4 {b, c} • Solution bundle: Cartesian product of bundles of variables • Solution-bundle size =1 3 1 2 =6 Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 8
Cost of solving Phase transition Mostly solvable problems [Cheeseman et al. ‘ 91] Mostly un-solvable problems Critical value of order parameter Order parameter • Significant increase of cost around critical value • In CSPs, order parameter is constraint tightness & ratio • Algorithms compared around phase transition Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 9
Non-binary CSPs V {1, 2, 3, 4, 5, 6} {1, 2, 3} V 3 C 2 C 1 V 4 {1, 2, 3} C 3 Constraint C 1 {1, 2, 3} V 1 V 2 • Scope(Cx): the set of variables involved in Cx • Arity(Cx): size of scope Variable C 2 C 3 C 4 V V 1 V 2 V V 3 V 2 V 3 V 4 V 1 V 4 1 1 3 1 2 1 1 3 3 2 3 1 2 2 2 1 3 3 2 2 2 1 3 1 2 3 3 4 2 2 3 1 1 4 2 3 1 1 3 2 2 6 1 4 1 1 4 2 2 5 3 2 6 3 2 Computing NI for non-binary CSPs is not a trivial extension from binary CSPs Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 10
CSP parameters • n number of variables • a domain size • t constraint tightness ratio of number of disallowed tuples over all possible tuples • degree of a variable • ck number of constraints of arity k • pk = ck / (nk) constraint ratio Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 11
Outline • Background • Neighborhood Interchangeability (NI) for non-binary CSPs – Non-binary discrimination tree (nb-DT) • Empirical evaluations • Database algorithms based on dynamic bundling • Conclusions & future work Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 12
NI for non-binary CSPs 1. Building an nb-DT for each constraint – V {1, 2, 3, 4, 5, 6} Determines the NI sets of variable given constraint C 2 V 3 C 1 V 4 Root {5} V 2 C 3 C 4 V 1 {1, 2} {5, 6} {3, 4} {1, 2} nb-DT(V, C 1) {6} {3, 4} nb-DT(V, C 2) 2. Intersecting partitions from nb-DTs – Yields NI sets of V (partition of DV) 3. Processing paths in nb-DTs – Gives, for free, updates necessary forward checking Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 13
Building an nb-DT: nb-DT(V, C 1) Domain of V C 1 V V 1 V 2 1 1 3 3 2 1 3 2 3 3 3 1 1 3 2 2 4 1 1 4 2 2 5 3 2 6 3 2 1 2 3 4 5 6 Root (<V 1 1>, <V 2 1>) (<V 1 1>, <V 2 3>) (<V 1 3>, <V 2 3>) {1, {1}2} (<V 1 3>, <V 2 2>) {5, 6} (<V 1 2>, <V 2 2>) Path {3, 4} Annotation O (deg. a (k+1). (1 - t)) Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 14
Bundling = Search + NI • Benefits of bundling 1. Bundles solutions 2. Bundles no-goods • • V {3, 4} {1, 2} V 1 {1} {1, 3} – Re-computes NI during search V 2 – Yields larger bundles, boosts effects V 3 of bundling {1} {3} Dynamic bundling (Dyn. Bndl) Skeptics’ objection to Dyn. Bndl – Costly & not worthwhile • V 4 We show that the converse holds {2} {1} Nogood bundle Solution bundle Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 15
Advantages of Dyn. Bndl • We exploit nb-DTs forward checking • Dyn. Bndl versus FC (BT+ forward checking) – Finding all solutions: theoretically best – Finding first solution: empirical evidence Dyn. Bndl yields multiple, robust for less cost Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 16
Outline • Background • Neighborhood Interchangeability (NI) for non-binary CSPs • Empirical evaluations • Database algorithms based on dynamic bundling • Conclusions & future work Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 17
Empirical evaluations • Dyn. Bndl versus FC (BT+forward checking) • Experiments – Effect of varying tightness – In the phase-transition region • Effect of varying domain size • Effect of varying constraint ratio (CR) • Randomly generated problems, Model B • ANOVA to statistically compare performance of Dyn. Bndl and FC with varying t • t-distribution for confidence intervals Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 18
Experimental set-up • Generated 16 data sets – n = {20, 30} a = {10, 15} {CR 1, CR 2, CR 3, CR 4} – 9— 12 values for t [25%, 75%] – 1, 000 instances per tightness value • Performance measurements – – FBS, size of the first solution bundle NV, number of nodes visited in the search tree CC, number of constraints checked CPU time Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 19
• Low tightness 20 – Large FBS • 33 at t=0. 35 • 2254 (Dataset #13, t=0. 35) – Small additional cost 18 16 14 12 • Phase transition Time [sec] #NV, hundreds Analysis: Varying tightness n=20 a=15 CR=CR 3 • High tightness 8 6 4 2 Dyn. Bndl FC 10 – Multiple solutions present – Maximum no-good bundling causes max savings in CPU time, NV, & CC FC NV t FBS 0. 350 33. 44 0. 400 10. 91 0. 425 7. 13 0. 437 6. 38 0. 450 5. 62 0. 462 2. 37 0. 475 0. 66 0. 500 0. 03 0. 550 0. 00 Dyn. Bndl CPU time 0 0. 325 0. 375 0. 425 0. 475 0. 525 0. 575 0. 6 – Problems mostly unsolvable – Overhead of bundling minimal Tightness Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 20
Analysis: Varying domain size • Increasing a in phasetransition – FBS increases: More chances for symmetry – CPU time decreases: more bundling of nogoods CR Improv (CPU) % FBS a=10 a=15 CR 1 33. 3 34. 3 5. 5 11. 9 CR 2 28. 6 33. 0 5. 5 CR 3 29. 8 31. 7 3. 6 5. 0 CR 4 28. 4 31. 6 1. 2 1. 4 Increasing a (n=30) Because the benefits of Dyn. Bndl increase with increasing domain size, Dyn. Bndl is particularly interesting for database applications where large domains are typical Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 21
Outline • Background • Neighborhood Interchangeability (NI) for non-binary CSPs • Empirical evaluations • Database algorithms based on dynamic bundling – Sorting-based bundling algorithm – Dynamic-bundling-based join algorithm • Conclusions & future work Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 22
Databases & CSPs • Same computational problems, different cost models – Databases: minimize # I/O operations – CSP community: # CPU operations • Challenges for using CSP techniques in DB – Use of lighter data structures to minimize memory usage – Fit in the iterator model of database engines DB terminology CSP terminology Table, relation Constraint (relational constraint) Join condition Constraint (join-condition constraint) Attribute CSP variable Tuple in a table Tuple in a constraint or allowed by one A sequence of natural joins All solutions to a CSP Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 23
Join operator • R 1 x y R 2 – Most expensive operator in terms of I/O – is “=” Equi-Join • x is same as y Natural Join • Join algorithms – Nested Loop – Sorting-based • Sort-Merge, Progressive Merge-Join (PMJ) • Partitions relations by sorting, minimizes # scans of relations – Hashing-based Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 24
The join query Join query SELECT R 2. A, R 2. B, R 2. C FROM R 1, R 2 WHERE R 1. A=R 2. A AND R 1. B=R 2. B AND R 1. C=R 2. C R 1 R 2 (Compacted) A B C {1, 5} {12, 13, 14} {23} {2, 4} {10} {25} {6} {13, 14} {27} Result: 10 tuples in 3 nested tuples Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 25
Modeling join query as a CSP • • Attributes of relations CSP variables Attribute values variable domains Relations relational constraints Join conditions join-condition constraints SELECT R 1. A, R 1. B, R 1. C R 1. A FROM R 1. B R 1. C R 1, R 2 WHERE R 1. A=R 2. A AND R 1. B=R 2. B AND R 1. C=R 2. C R 2 R 1 R 2. C R 2. A R 2. B Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 26
Progressive Merge Join • PMJ: a sort-merge algorithm by [Dittrich et al. ‘ 03] • Two phases 1. Sorting: sorts sub-sets of relations & produces early results 2. Merging phase: merges sorted sub-sets • We use the framework of the PMJ for our external join Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 27
New join algorithm • Sorting & merging phases – Load sub-sets of relations in memory – Compute in-memory join using dynamic bundling • In-memory join – Uses sorting-based bundling (shown next) – Computes join of in-memory relations using dynamically computed bundles Cool animation upon request Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 28
Computing a bundle of R 1. A • Partition of a constraint –Tuples of the relation having the same value of R 1. A • Compare projected tuples of first partition with those of another partition • Compare with every other partition to get complete bundle R 1 Partition Unequal partitions Symmetric partitions A B C 1 12 23 1 13 23 1 14 23 2 10 25 5 12 23 5 13 23 5 14 23 Bundle {1, 5} Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 29
Experiments • XXL library for implementation & evaluation • Data sets • Random: 2 relations R 1, R 2 with same schema as example – Each relation: 10, 000 tuples – Memory size: 4, 000 tuples – Page size 200 tuples • • Real-world problem: 3 relations, 4 attributes Compaction rate achieved – Random problem: 1. 48 – Savings even with (very) preliminary implementation – Real-world problem: 2. 26 (69 tuples in 32 nested tuples) Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 30
Outline • Background • Neighborhood Interchangeability (NI) for non-binary CSPs • Empirical evaluations • Database algorithms based on dynamic bundling • Conclusions & future work Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 31
Conclusions • Algorithm for computing NI sets in non-binary CSPs • Dyn. Bndl – produces multiple robust solutions – significantly reduces cost of search at phase transition • New dynamic-bundling-based join algorithm Constraint Processing inspires innovative solutions to fundamental difficult problems in Databases Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 32
Future work • Sort constraint definitions to improve CSP techniques • Design bundling mechanisms for gap & linear constraints in Constraint Databases • Explore benefits of bundling in Databases – Sampling operator – Main-memory databases – Automatic categorization of query results Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 33
Thanks!! Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 34
Related work • Join algorithms – Well established algorithms – Do not focus on exploiting symmetry • Database compression – Output results are not compressed – Compression at value level, not tuple level Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 35
Related work (contd) • [Mamoulis & Papadias 1998] – Join using FC for spatial DB – Restricted to binary constraints – No compaction of solution space • [Bayardo et al. 1996] – Reduce the number of the intermediate tuples of a sequence of joins • [Rich et al. 1993] – Do not compact join attribute values – Does not detect redundancy present in the grouped sub-relations Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 36
Analysis of overheads • For Bundling – Additional data structures: 2 arrays, 1 pointer – Only 1 array (Processed values) may become cumbersome • Array size is largest – when all the values of a variable are in one bundle – But, this case also leads to best savings! Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 37
Sorting-based bundling • Heuristic for variable ordering R 1. A Place variables linked by join conditions as close to each other as possible R 1. B R 2. A R 2. B R 1 R 2 R 1. C R 2. C § Sort relations using above ordering § Next: Compute bundles of variable ahead in variable ordering (R 1. A) Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 38
Join using bundling Processed values R 1 A B C 5 Processed values R 2 A B C Computing bundle for R 1. A 5 Symmetric Select partitions, to compare for R 1. A Adding to bundle of R 1. A, Current bundle of R 1. A = {1, 5} Update processed values for R 1. A R 2. A R 1 R 2 R 1. B Computing bundle for R 2. A R 2. B Select partition to compare Symmetric partitions, Adding to bundle of R 2. A, Current bundle of R 2. A = {1, 5} Update processed values for R 2. A R 1. C R 2. C Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 39
Join using bundling Processed values R 1 A B C 1, 5 5 Processed values R 2 A B C Current bundle of R 1. A = {1, 5} Assign {1, 5} toof R 2. A Current bundle R 2. A = {1, 5} 5 R 1 R 2 R 1. A 1, 5 R 2. A 1, 5 Compute constraint Assign Common(R 1. A, {1, current 5} to R 2. A) R 1. A = {1, 5}of R 2 Next variable R 1. B Compute current constraint of R 1. B R 2. B R 1. C R 2. C Constraint Systems Laboratory April 21, 2005 Lal: M. S. thesis defense 40
- Slides: 40