Probabilistic Data Management Chapter 7 Probabilistic Query Answering
Probabilistic Data Management Chapter 7: Probabilistic Query Answering (5)
Objectives n In this chapter, you will: q Explore the definitions of more probabilistic query types n n Probabilistic skyline query Probabilistic reverse skyline query 2
Recall: Probabilistic Query Types n Uncertain/probabilistic database q q q q Probabilistic Spatial Query Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Preference Query 3
Probabilistic Skyline on Uncertain Data Very Large Data Bases (VLDB), 2007
Skyline Query n Skyline definition q Point X(X 1, X 2, …, Xd) dominates point Y(Y 1, Y 2, …, Yd), iff it holds that: n n q skyline points 1) Xi Yi for all 1 i d; 2) Xj < Yj, for some 1 j d Point X is a skyline point if X is not dominated by other points 5
Motivation of Probabilistic Skyline Query n Motivation example q NBA dataset 6
Terminology n n n U, V – uncertain object u, v – an instance of U or V V ≺ U (v ≺ u) – the former dominates the latter Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007 7
Probabilistic Skyline Query n Dominance probability: q q Continuous case: U, V – uncertain object u, v – an instance of U or V V ≺ U (v ≺ u) – the former dominates the latter Discrete case: (U has l 1 instances, and V has l 2 instances) Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007 8
Probabilistic Skyline Query (cont'd) n n Skyline probability q Continuous case: q Discrete case: (U has l instances) p-skyline: Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007 9
Example of Calculating Skyline Probability 4 instances of A 3 instances of B 3 instances of C 3 instances of D n The probability Pr(D) that D is not dominated by other objects is given by: Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007 10
Basic Pruning Rule n Bounding skyline probability q n Pr(Umax) Pr(Umin) If Pr(Umin) < p, then U can be pruned; if Pr(Umax) p, then U is the final result Umax U Umin Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007 11
Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases ACM Conference on the Management of Data (SIGMOD), 2008
Recall: Static Skyline Problem n Point o(o 1, o 2, …, od) dominates point p(p 1, p 2, …, pd), iff q q n oi pi for all 1 i d; oj < pj, for some 1 j d static skyline points Point o is a skyline point if o is not dominated by other points 13
Dynamic Skyline [Dellis and Seeger, VLDB 07] n n Skyline with dynamic attributes Dynamic dominance q q n |oi - ui| |pi - ui|, for all 1 i d |oj - uj| < |pj - uj|, for some j To obtain all the objects in the database that are not dynamically dominated by other objects with respect to query object u |o 1 – u 1| o |o 2 – u 2| p dominating regions 14
Reverse Skyline Query [Dellis and Seeger, VLDB 07] n Given a query point q, a reverse skyline query obtains all the objects u such that the dynamic skyline points of u include query point q dynamic skyline of point b b is a reverse skyline of q 15
Motivation Example n n n In a laptop market, each model corresponds to a 2 D point in a price and performance space Those customers who are interested in f, are very likely to be interested in a and c If a company wants to produce a new model, … new model q f a model that customers prefer 16
The Laptop Market Next Year n How about the laptop market in the coming year? q q The performance or price attribute of each model may vary Monochromatic reverse skyline problem over uncertain data (MPRS) 17
The Bichromatic Case (BPRS) data set A data set B 18
Outline n n n Introduction Problem Definition Monochromatic PRS Query Processing Bichromatic PRS Query Processing Experimental Results Summary 19
Introduction n n In the context of uncertain databases, each uncertain object o is usually modeled as an uncertainty region UR(o) Uncertain object can reside within its uncertainty region with any data distribution 20
Monochromatic Probabilistic Reverse Skyline (MPRS) Query n MPRS Query q q d-dimensional uncertain database D query object q probability threshold (0, 1] MPRS query retrieves all the objects u D such that u is a reverse skyline point of q with probability greater than or equal to , that is, 21
Bichromatic Probabilistic Reverse Skyline (BPRS) Query n BPRS Query q q two d-dimensional uncertain databases A and B query object q probability threshold (0, 1] BPRS query obtains all the objects u A such that u is a reverse skyline point of q in B with probability greater than or equal to , that is, 22
Linear Scan Method n For each object u in uncertain database D (or A in bichromatic case) q q sequentially scan objects in D (or B) to calculate the probability PMPRS(u) (or PBPRS(u)) return object u as PRS answer if PMPRS(u) (or PBPRS(u) ) 23
Pruning Techniques n n Spatial Pruning Probabilistic Pruning 24
Spatial Pruning n n n Assume uncertain object p is an MPRS candidate and Np is the farthest point in UR(p) to q Point Mp is the middle point between q and Np Any object o fully contained in the pruning region can be safely pruned 25
Probabilistic Pruning n n For uncertain object o, we pre-compute an inner rectangle, called (1 - )-hyperrectangle, UR 1 - (o), such that o locates in UR 1 - (o) with probability (1 - ), where [0, ) Any object o whose UR 1 - (o) is completely contained in the pruning region can be safely pruned (1 - )-hyperrectangle UR 1 - (o) 26
Framework for PRS n Indexing Phase q n Pruning Phase q n Construct a multidimensional index (e. g. R-tree) over the uncertain data Traverse the index and perform the spatial and probabilistic pruning Refinement Phase q Refine the PRS candidates and return the answer set 27
MPRS Query Processing n Traversal of the Index q q For each encountered entry/object ei in nodes, we check whether or not it is fully contained in the pruning regions defined by candidates seen so far (via spatial pruning) In addition, for each encountered object o, we apply the probabilistic pruning by considering (1 - )-hyperrectangle UR 1 - (o) 28
MPRS Query Processing (cont. ) n Refinement q Only considering objects that intersect with the refinement region 29
MPRS via Pre-Computation 30
BPRS Query Processing index construction index traversal 31
Experimental Evaluation n Experimental Settings q Synthetic data sets n n q Generate center location Co of uncertain object o in a data space [0, 1000]d Produce radius ro [rmin, rmax] for uncertainty region UR(o) Randomly generate a hyperrectangle within sphere centered at Co and with radius ro Four types of data sets: l. Ur. U, l. Ur. G, l. Sr. U, l. Sr. G Measures: n n Filtering time (including CPU time and I/O cost) Speed-up ratio compared with the linear scan method 32
MPRS Query Performance l. Ur. U data set (data size = 100 K, dimensionality d = 3) 33
BPRS Query Performance l. Ur. G – l. Ur. G (dimensionality d = 3) 34
Summary n n MPRS and BPRS queries over uncertain data Spatial and probabilistic pruning PRS query processing with pre-computation Experimental evaluation 35
- Slides: 35