Lecture 16 Probabilistic Databases Slides by Gerome Miklau
Lecture 16: Probabilistic Databases Slides by Gerome Miklau Based on a tutorial by Dan Suciu 1
Today’s Agenda 1. Motivation 2. Probabilistic Data Semantics 3. Representation Systems 4. Complexity 2
Section 1 1. Motivation 3
Section 1 Motivating Applications • Text extraction & record linkage • Inconsistent data • Ranking query answers 4
Section 1 Text extraction 5
Section 1 Record Linkage 6
Section 1 Inconsistent Data • Goal: consistent query answers from inconsistent databases • Applications: • Integration of autonomous data sources • Un-enforced integrity constraints • Temporary inconsistencies 7
Section 1 Repair semantics 8
Section 1 Alternative probabilistic approach 9
Section 1 Ranking query answers • Database is deterministic • Query answers are uncertain: • Query terms loosened due to user’s lack of understanding of the data or schema • The query returns a ranked list of tuples; user interested in top-k 10
Section 1 Summary: motivating applications 11
Section 2 2. Probabilistic Data Semantics 12
Section 2 Possible worlds semantics 13
Section 2 The definition 14
Section 2 Example 15
Section 2 Tuples as Events 16
Section 2 Tuple correlation 17
Section 2 Example 18
Section 2 Query semantics 19
Section 2 Query semantics 20
Section 2 Example: Query Semantics 21
Section 2 Query semantics 22
Section 3 3. Representation Systems 23
Section 3 Representation systems 24
Section 3 Representation systems 25
Section 3 Tuple independent probabilistic database 26
Section 3 Tuple Prob. -> Possible Worlds 27
Section 3 Tuple Prob. -> Query evaluation 28
Section 3 Tuple-independent distributions 29
Section 3 Intensional database 30
Section 3 Intensional DB => Possible Worlds 31
Section 3 Possible Worlds => Intensional DB 32
Section 3 Closure under operators 33
Section 3 Summary of Intensional Databases 34
Section 4 4. Complexity 35
Section 4 Probability of boolean expressions 36
Section 4 Example 37
Section 4 Complexity of Boolean Expression Probability 38
Section 4 Query complexity 39
Section 4 Intensional query evaluation 40
Section 4 Extensional query evaluation 41
Section 4 42
Section 4 Query complexity 43
Section 4 Summary on query complexity 44
- Slides: 44