Symmetry Detection in Constraint Satisfaction Problems its Application
Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science & Engineering University of Nebraska-Lincoln Joint work with Amy Beckwith-Davis, Anagh Lal, and Eugene C. Freuder Supported by NSF CAREER award #0133568 Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Outline • Definitions – CSP – Interchangeability – Bundling • Bundling in CSPs • Bundling for join query computation • Conclusions Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Constraint Satisfaction Problem (CSP) • Given P = (V, D, C) V 1 {d} V 2 {c, d, e, f} V 4 – V : set of variables V 3 {a, b, d} {a, b, c} – D : set of their domains – C : set of constraints (relations) restricting the acceptable combination of values for variables – Solution is a consistent assignment of values to variables • Query: find 1 solution, all solutions, etc. • Examples: SAT, scheduling, product configuration • NP-Complete in general Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Solution techniques (simplified) • Search – Backtrack search • Constructive • Complete (in theory) and sound – Iterative repair • Repairs a complete but inconsistent assignment of values to variables by doing local repairs • In general, neither sound nor complete • Constraint propagation – Removes from the problem values (or combinations of values) that are inconsistent with the constraints {1, 2, 3} – In general, efficient (polynomial time) {1, 2, 3, 4} Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series {1, 2}
Backtrack search S V 1 V 3 V 2 {d} { c, d, e, f } V 4 {a, b, d} d V 1 V 2 {a, b, c} V 1 c e d V 2 {c, d, e, f} f d V 3 {a, b, d} V 4 {a, b, c} • DFS + backtracking (linear space) – Variable being instantiated: current variable – Un-instantiated variables: future variables – Instantiated variables: past variables • + Constraint propagation – Backtrack search with forward checking (FC) Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series Solution V 1 d V 2 e V 3 a V 4 c
Interchangeability [Freuder, 91] • Captures the idea of symmetry between solutions • Functional interchangeability – Any mapping between two solutions – Including permutation of values across variables, equivalent to graph isomorphism In every solution V V V 1 d { c, d, e, f } V 1 {d} V 2 c V 2 {d, e, f} V V 3 a V 3 b V V 3 {a, b, d} {a, b, c} V 4 b V 4 a V 4 1 2 4 3 • Full interchangeability (FI) – Restricted to values of a single variable – Also, likely intractable Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Value interchangeability 91] • Full Interchangeability (FI): – d, e, f interchangeable for V 2 in any solution • Neighborhood Interchangeability (NI): – – Considers only the neighborhood of the variable Finds e, f but misses d Efficiently approximates FI Discrimination tree DT(V 2) V 1 {d} {c, d, e, f } V 2 V 3 {a, b, d} {a, b, c} V 4 Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series [Freuder,
Outline • Definitions • Bundling in CSPs – Static bundling – Dynamic bundling for non-binary CSPs • Bundling for join query computation • Conclusions Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Bundling: using NI in search V 1 V 2 {d} V 1 { c, d, e, f } V 2 V 3 V 4 {a, b, d} {a, b, c} • Static bundling S { c, d, e, f } V 1 d V 2 { d, c, e, f } V 3 V 4 c e, f d V 1 d V 2 {e, f} V 3 a V 4 {b, c} Static bundling [Haselböck, 93] – Before search: compute & store NI sets – During search, when a variable: • is a future variable: forward checking removes bundle of equivalent values • is the current variable: assign a bundle of equivalent values • Advantages – Reduces search space – Creates bundled solutions Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Dynamic bundling (Dyn. Bndl) [2001] V 1 V 3 V 2 {d} { c, d, e, f } V 4 {a, b, d} S S V 1 d V 2 {a, b, c} c e, f d Static bundling <V 3, a> <V 3, b> <V 4, a> V 2 c d, e, f Dynamic bundling <V 4, b> <V 4, c> V 2, {c} V 2, {d, e, f} • Dynamically identifies NI • Using discrimination tree forward checking: – is never less efficient than BT & static bundling Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Non-binary CSPs V {1, 2, 3, 4, 5, 6} {1, 2, 3} V 3 C 2 C 1 V 4 {1, 2, 3} C 3 Constraint C 1 {1, 2, 3} V 1 V 2 • Scope(Cx): the set of variables involved in Cx • Arity(Cx): size of scope Variable C 2 C 3 C 4 V V 1 V 2 V V 3 V 2 V 3 V 4 V 1 V 4 1 1 3 1 2 1 1 3 3 2 3 1 2 2 2 1 3 3 2 2 2 1 3 1 2 3 3 4 2 2 3 1 1 4 2 3 1 1 3 2 2 6 1 4 1 1 4 2 2 5 3 2 6 3 2 Computing NI for non-binary CSPs is not a trivial extension from binary CSPs Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series 11
NI for non-binary CSPs [2003, 2005] 1. Building an nb-DT for each constraint – V {1, 2, 3, 4, 5, 6} Determines the NI sets of variable given constraint C 2 V 3 C 1 V 4 Root {5} V 2 C 3 C 4 V 1 {1, 2} {5, 6} {3, 4} {1, 2} nb-DT(V, C 1) {6} {3, 4} nb-DT(V, C 2) 2. Intersecting partitions from nb-DTs – Yields NI sets of V (partition of DV) {1, 2} {3, 4} {5} {6} 3. Processing paths in nb-DTs – Gives, for free, updates necessary forward checking Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Robust solutions Single Solution V 1 d V 2 e V 3 a V 4 c Static bundling Dynamic bundling V 1 d V 2 {e, f} V 3 a V 4 {b, c} V 1 d V 2 {d, e, f} V 3 a V 4 {b, c} • Solution bundle – Cartesian product of domain bundles – Compact representation – Robust solutions • Dynamic bundling finds larger bundles Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Dyn. Bndl: worth the effort? • Finds larger bundles • Enables forward checking at no extra cost • Does not cost more than BT or static bundling – Cost model: • # nodes visited by search • # constraint checks made − Theoretical guarantee holds • for finding all solutions • under same variable ordering ¿ Finding first solution ? − Experiments uncover an unexpected benefit Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Bundling of no-goods… V 3 V C 2 {1, 2, 3, 4, 5, 6} V 4 {1, 2, 3} V {3, 4} {1, 3} V 1 {1} {3} V 2 {1} V 3 {2} V 4 {1} {1, 2, 3} V 2 C 1 C 4 {1, 2} {1, 2, 3} C 3 {1, 2, 3} V 1 No-good bundle Solution bundle • … is particularly effective Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Experimental set-up • CSP parameters: n: number of variables {20, 30} a: domain size {10, 15} t: constraint tightness [25%, 75%] CR: constraint ratio (arity: 2, 3, 4) 1, 000 instances per tightness value • Phase transition • Performance measures – – Nodes visited (NV) Constraint checks (CC) CPU time First Bundle Size (FBS) Cost of solving – – – Mostly solvable instances Critical value Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series Mostly un-solvable instances Order parameter
Empirical evaluations • Dyn. Bndl versus FC (BT + forward checking) • Randomly generated problems, Model B • Experiments – Effect of varying tightness – In the phase-transition region • Effect of varying domain size • Effect of varying constraint ratio (CR) • ANOVA to statistically compare performance of Dyn. Bndl and FC with varying t • t-distribution for confidence intervals Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
• Low tightness 20 – Large FBS • 33 at t=0. 35 • 2254 (Dataset #13, t=0. 35) – Small additional cost 18 16 14 12 • Phase transition Time [sec] #NV, hundreds Analysis: Varying tightness n=20 a=15 CR=CR 3 • High tightness 8 6 4 2 Dyn. Bndl FC 10 – Multiple solutions present – Maximum no-good bundling causes max savings in CPU time, NV, & CC FC NV t FBS 0. 350 33. 44 0. 400 10. 91 0. 425 7. 13 0. 437 6. 38 0. 450 5. 62 0. 462 2. 37 0. 475 0. 66 0. 500 0. 03 0. 550 0. 00 Dyn. Bndl CPU time 0 0. 325 0. 375 0. 425 0. 475 0. 525 0. 575 0. 6 – Problems mostly unsolvable – Overhead of bundling minimal Tightness Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series 18
Analysis: Varying domain size • Increasing a in phasetransition – FBS increases: More chances for symmetry – CPU-time improvement also increases: more bundling of no-goods CR Improv (CPU) % FBS a=10 a=15 CR 1 33. 3 34. 3 5. 5 11. 9 CR 2 28. 6 33. 0 5. 5 CR 3 29. 8 31. 7 3. 6 5. 0 CR 4 28. 4 31. 6 1. 2 1. 4 Increasing a (n=30) Because the benefits of Dyn. Bndl increase with increasing domain size, Dyn. Bndl is particularly interesting for database applications where large domains are typical Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series 19
Outline • Definitions • Bundling in CSPs • Bundling for join query computation – – Idea A CSP model for the query join Sorting-based bundling algorithm Dynamic-bundling-based join algorithm • Conclusions Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
The join query Join query SELECT R 2. A, R 2. B, R 2. C FROM R 1, R 2 WHERE R 1. A=R 2. A AND R 1. B=R 2. B AND R 1. C=R 2. C R 1 R 2 (compacted) A B {1, 5} {12, 13, 14} {23} {2, 4} {10} {25} {6} {13, 14} {27} Constraint Systems Laboratory December 9, 2005 C ISI AI Seminar Series Result: 10 tuples in 3 nested tuples
Databases & CSPs • Same computational problems, different cost models – Databases: minimize # I/O operations – CSP community: # CPU operations • Challenges for using CSP techniques in DB – Use of lighter data structures to minimize memory usage – Fit in the iterator model of database engines DB terminology CSP terminology Table, relation Constraint (relational constraint) Join condition Constraint (join-condition constraint) Attribute CSP variable Tuple in a table Tuple in a constraint or allowed by one Computing a join sequence Finding all solutions to a CSP Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Join query as a CSP • R 1 x y R 2 – Most expensive operator in terms of I/O – is “=” equi-Join • x is same as y natural Join • CSP model – – Attributes of relations CSP variables Attribute values variable domains Relations relational constraints Join conditions join-condition constraints Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series SELECT R 1. A, R 1. B, R 1. C FROM R 1, R 2 WHERE R 1. A=R 2. A AND R 1. B=R 2. B AND R 1. C=R 2. C
Join algorithms • Join algorithms – Nested Loop – Sorting-based • • Two steps: sorting, merging Partitions relations by sorting, minimizes # scans of relations – Hashing-based • Progressive Merge-Join – MJ: a sort-merge algorithm – Two phases [Dittrich et al. 03] 1. Sorting: sorts sub-sets of relations & 2. Merging phase: merges sorted sub-sets – PMJ produces early results – We use the framework of the PMJ Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
New join algorithm • Sorting phase – Load sub-sets of relations in memory – Uses sorting-based bundling (shown next) • Merging phase – Computes join of in-memory relations using dynamically computed bundles Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Sorting-based bundling • Heuristic for variable ordering R 1. A Place variables linked by join conditions as close to each other as possible R 1. B R 2. A R 2. B R 1 R 2 R 1. C R 2. C • Sort relations using above ordering • Next: Compute bundles of variable ahead in variable ordering (R 1. A) Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Computing a bundle of R 1. A • Partition of a constraint –Tuples of the relation having the same value of R 1. A • Compare projected tuples of first partition with those of another partition • Compare with every other partition to get complete bundle R 1 Partition Unequal partitions Symmetric partitions Bundle {1, 5} Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series A B C 1 12 23 1 13 23 1 14 23 2 10 25 5 12 23 5 13 23 5 14 23
Finding the valid bundle Common {1, 5, x} {1, 5, y, z} {1, 5} 1. Compute a bundle for the attribute 2. Check bundle validity with future constraints 3. If no common value ‘backtrack’ Assign variable with the surviving values in the bundle R 1 A {1, 5} Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series R 2 (compacted) B {12, 13, 14} C {23} {2, 4} {10} {25} {6} {13, 14} {27}
Experiments • XXL library for implementation & evaluation • Data sets • Random: 2 relations R 1, R 2 with same schema as example – Each relation: 10, 000 tuples – Memory size: 4, 000 tuples – Page size 200 tuples • • Real-world problem: 3 relations, 4 attributes Compaction rate achieved – Random problem: 1. 48 – Savings even with (very) preliminary implementation – Real-world problem: 2. 26 (69 tuples in 32 nested tuples) Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Outline • • Definitions Bundling in CSPs Bundling for join query computation Conclusions – Summary – Future research Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Summary • Dynamic bundling in finite CSPs – Binary and non-binary constraints – Produces multiple robust solutions – Significantly reduces cost of search at phase transition • Application to join-query computation Constraint Processing inspires innovative solutions to fundamental difficult problems in Databases Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Future research • CSPs – – Only scratched the surface: interchangeability + decomposition [ECAI 96] partial interchangeability [AAAI 98, Wilson 05] tractable structures • Databases – Investigate benefit of bundling • Sampling operator • Automatic categorization of query results • Main-memory databases • Constraint databases – Design bundling mechanisms for gap & linear constraints over intervals (spatial databases) Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Thank you for your attention Time left for questions? Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
Sample projects 1. Symmetry detection • Search, databases 2. Graduate TA Assignment Project (GTAAP) • Modeling, search, GUI 3. Temporal reasoning • STP: STP • TCSP: AC, search heuristics 4. Structural decompositions • Ca. T: efficient approximation of the hypertree decomposition • Ind. Set: multiple solutions, (almost) k-partite structure Constraint Systems Laboratory December 9, 2005 ISI AI Seminar Series
- Slides: 34