Mining Favorable Facets Raymond ChiWing Wong the Chinese
Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University) Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong 1
Outline 1. 2. 3. 4. 5. Introduction Skyline Algorithm Empirical Study Conclusion 2
Package a “dominates” package b Thus, we do not need to consider package b. 1. Introduction Suppose we want to look for a vacation package 3 packages We want to have a cheaper package. We want to have a higher hotel-class. Package ID Price Hotel-class a 1000 4 b 2400 1 c 3000 5 Suppose we compare package a and package b We know that package a is “better” than package b because 1. Price of package a is smaller 2. Hotel-class of package a is higher 3
Package a “dominates” package b Thus, we do not need to consider package b. 1. Introduction Suppose we want to look for a vacation package 3 packages We want to have a cheaper package. We want to have a higher hotel-class. Package ID Price Hotel-class a 1000 4 b 2400 1 c 3000 5 Suppose we compare package a and package b We know that package a is “better” than package b because 1. Price of package a is smaller 2. Hotel-class of package a is higher 4
Package a “dominates” package b Thus, package a and package c are all of the “best” possible choices. 1. Introduction We call that package a and package c are skyline points. Points are not dominated by Package a is NOT dominated by we want to look for a vacation packageany other points any. Suppose other packages. by a cheaper package. 3 packages. Package c is NOT Wedominated want to have any other packages. We want to have a higher hotel-class. Package ID Price Hotel-class a 1000 4 b 2400 1 c 3000 5 Suppose we compare package a and package c We know that 1. Package a has a cheaper price 2. Package c has a higher hotel-class We cannot determine 1. whether package a is better than package c (i. e. , package a dominates package c) 2. whether package c is better than package a (i. e. , package c dominates package a) 5
Suppose a customer have the following preferences. H<T<M Suppose another customer have the following preferences. H<M<T The skyline points are 1. Introduction packages a and c. packages a, c and e. Suppose we want to look for a vacation package. In other words, different preferences give different 6 packages We want to have a cheaper package. skyline points. We want to have a higher hotel-class. Package ID Price Hotel-class Hotel-group a 1000 4 T (Tulips) b 2400 1 T (Tulips) c 3000 5 H (Horizon) d 3600 4 H (Horizon) e 2400 2 M (Mozilla) f 3000 3 M (Mozilla) How about this one? Different customers may have different preferences on Hotel-group. 6
Suppose a customer have the following preferences. H<T<M Suppose another customer have the following preferences. H<M<T The skyline points are 1. Introduction packages a and c. packages a, c and e. Suppose we want to look for a vacation package. In other words, different preferences give different 6 packages skyline points. Package ID Price Hotel-class Hotel-group a 1000 4 T (Tulips) b 2400 1 T (Tulips) c 3000 5 H (Horizon) d 3600 4 H (Horizon) e 2400 2 M (Mozilla) f 3000 3 M (Mozilla) Suppose hotel-group Mozilla wants to promote its own packages (e. g. , package f) to potential customers. 7
Customer Alice Preference on Hotelgroup Skyline T<M {a, c} H<M {a, c, e} David H<M<T {a, c, e} Emily H<T<M {a, c} Fred M<T {a, c, e, f} Bob Chris special preference {a, c, e, f} 1. No Introduction Package ID Price Hotel-class Hotel-group a 1000 4 T (Tulips) b 2400 1 T (Tulips) c 3000 5 H (Horizon) d 3600 4 H (Horizon) e 2400 2 M (Mozilla) f 3000 3 M (Mozilla) What preferences make package f a skyline point? Preferences: No special preference M<T … Suppose hotel-group Mozilla wants to promote its own packages (e. g. , package f) to potential customers. Bob and Fred are the potential customers. 8
Customer Alice Preference on Hotelgroup Skyline T<M {a, c} H<M {a, c, e} David H<M<T {a, c, e} Emily H<T<M {a, c} Fred M<T {a, c, e, f} Bob Chris special preference {a, c, e, f} 1. No Introduction Package ID Price Hotel-class Hotel-group a 1000 4 T (Tulips) b 2400 1 T (Tulips) c 3000 5 H (Horizon) d 3600 4 H (Horizon) e 2400 2 M (Mozilla) f 3000 3 M (Mozilla) Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? What preferences make package e a skyline point? Preferences: No special preference H<M<T M<T … Suppose hotel-group Mozilla wants to promote its own packages (e. g. , package e) to potential customers. Bob, Chris, David and Fred are the potential customers. Favorable facets 9
1. Introduction Package ID Price Hotel-class Hotel-group a 1000 4 T (Tulips) b 2400 1 T (Tulips) c 3000 5 H (Horizon) d 3600 4 H (Horizon) e 2400 2 M (Mozilla) f 3000 3 M (Mozilla) Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets 10
Package ID Price Hotel-class Hotel-group a 1000 4 T (Tulips) b 2400 1 T (Tulips) 3600 4 H (Horizon) e 2400 2 M (Mozilla) f 3000 3 M (Mozilla) c d 5 H (Horizon) 1. 3000 Introduction Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets 11
Package ID Price Hotel-class a 1000 4 b 2400 1 3600 4 Hotel-group Problem: Given a package, we want to find what preferences or favorable facets T (Tulips) that this package is a skyline point? T (Tulips) We can solve the problem H (Horizon) by a naive method: Lattice Search H (Horizon) e 2400 2 M (Mozilla) f 3000 3 M (Mozilla) {} SKY={a, c, e, f} c d 5 1. 3000 Introduction {T < M} {H < M} SKY={a, c, e} {T < M, H < M} SKY={a, c} {T < H} {H < T} {M < H} SKY={a, c, e, f} {T < M, T < M} SKY={a, c} {T < M, H < M} SKY={a, c} {H < T, H < M} SKY={a, c, e} {T < H, M < H} SKY={a, c, e, f} … {T < M, M < H} SKY={a, c} T SKY={} 12
Package ID Price Hotel-class a 1000 4 b 2400 1 3600 4 e 2400 2 f 3000 3 c d Hotel-group Problem: Given a package, we want to find what preferences or favorable facets T (Tulips) that this package is a skyline point? T (Tulips) We can solve the problem H (Horizon) by a naive method: Lattice Search H (Horizon) Consider package f M (Mozilla) 5 1. 3000 Introduction {T < M} {H < M} SKY={a, c, e} {T < M, H < M} SKY={a, c} M (Mozilla) {} SKY={a, c, e, f} Preferences: {} , {T < H}, {H < T}, {M < H}, {T < H, M < H} {T < H} {H < T} {M < H} SKY={a, c, e, f} {T < M, T < M} SKY={a, c} {H < T, H < M} SKY={a, c, e} {T < H, M < H} SKY={a, c, e, f} … We. Mneed This {T < approach M, T < M, has H <two M} disadvantages. {T < M, < H}to compute all skyline points for each possible preference 1. Computation is costly. SKY={a, c} 2. It is difficult to interpret the results. T 13 There are many preferences. SKY={} which qualify package f as a skyline point
Package ID Price Hotel-class a 1000 4 b 2400 1 3600 4 e 2400 2 f 3000 3 c d Hotel-group Problem: Given a package, we want to find what preferences or favorable facets T (Tulips) that this package is a skyline point? T (Tulips) We can solve the problem H (Horizon) by a naive method: Lattice Search H (Horizon) Consider package f M (Mozilla) 5 1. 3000 Introduction border for f {T < M} {H < M} SKY={a, c, e} {T < M, H < M} SKY={a, c} M (Mozilla) {} SKY={a, c, e, f} We find that whenever the preference contains “T < M” or “H < M”, package f is not a skyline point. {T < H} {H < T} {M < H} SKY={a, c, e, f} {T < M, T < M} SKY={a, c} {T < M, H < M} SKY={a, c} {H < T, H < M} SKY={a, c, e} {T < M, M < H} SKY={a, c} T SKY={} {T < H, M < H} SKY={a, c, e, f} … Skyline point Not skyline point 14
Package ID Price Hotel-class a 1000 4 b 2400 1 3600 4 e 2400 2 f 3000 3 c d Hotel-group Problem: Given a package, we want to minimal conditions that find what preferences or favorable facets T (Tulips) this package is NOT skylinepoint? that this package is aaskyline T (Tulips) We can solve the problem H (Horizon) by a naive method: Lattice Search H (Horizon) Consider package f M (Mozilla) 5 1. 3000 Introduction border for f {T < M} {H < M} SKY={a, c, e} {T < M, H < M} SKY={a, c} M (Mozilla) {} SKY={a, c, e, f} We find that whenever the preference contains “T < M” or “H < M”, package f is not a skyline point. {T < H} {H < T} {M < H} SKY={a, c, e, f} {T < M, T < M} {H < T, H < M} {T < H, M < H} We can say that SKY={a, c} SKY={a, c, e} “T < M” or “H < M” is a minimal. SKY={a, c, e, f} {T < M, H < M} SKY={a, c} disqualifying condition (MDC). {T < M, M < H} SKY={a, c} T SKY={} … Skyline point Not skyline point 15
3. Algorithm n Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point? How to find MDCs of a point? 16
Package ID Price Hotel-class Hotel-group a 1000 4 T (Tulips) b 2400 1 T (Tulips) 3600 4 H (Horizon) e 2400 2 M (Mozilla) f 3000 3 M (Mozilla) c d 5 H (Horizon) 3. 3000 Algorithm Point q is said to quasi-dominate point p if all attributes of point q are NOT worse than those of point p. e. g. Package a quasi-dominates package f because 1. Package a has a lower (or better) price than package f 2. Package a has a higher (or better) hotel-class than package f If package a quasi-dominates package f, we define Ra f as follows. {T < M} 17
3. Algorithm n Two Algorithms n MDC-O: Computing MDC On-the-fly n n n Does not store MDCs of points Compute MDC of a given points on-the-fly MDC-M: A Materialization Method n n Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point? Store MDCs of all points Indexing Method for Speed-up n R*-tree 18
Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point? 3. 1 MDC-O: Computing MDC On-the-fly n On-the-fly Algorithm n Given n n Variable n n data point p MDC(p): minimal disqualifying condition Algorithm n n MDC(p) For each data point q which quasi-dominates p n n if MDC(p) does not contain Rq p to MDC(p) n insert R Return MDC(p) 19
Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point? 3. 2 MDC-M: A Materialization Method n Materialization Algorithm n Variable n n MDC(p): minimal disqualifying condition Algorithm MDC(p) n For each data point p n n n For each data point q which quasi-dominates p q p n if MDC(p) does not contain R then insert Rq p to MDC(p) Store MDC(p) Query Algorithm n Given n n A data point p Algorithm n Return MDC(p) 20
4. Empirical Study n Datasets n n Synthetic Dataset Real Dataset (from UCI) n n n Nursery Dataset Automobile Dataset Default Values (Synthetic) n n No. No. of of tuples = 500 K numeric dimensions = 3 categorical dimensions = 1 values in a nominal dimension = 20 21
4. Empirical Study Without indexing: MDC-O: Slowest Search Time MDC-M: Faster Search Time Storage of MDC: 8 MB With indexing: MDC-O and MDC-M: Fast Search Time 22
A salesperson should NOT promote this car to the customer who prefers Toyota to Honda. 4. Empirical Study n Automobile n Three car models A salesperson should promote this car to the customer who prefers Mitsubishi to others. Car MDC Honda “Toyota < Honda” Mitsubishi “Honda < Mitsubishi” or “Toyota < Mitsubishi” Toyota A salesperson should promote this car to ANY customers. 23
5. Conclusion n n Skyline Favorable Facets n n Algorithm n n n Minimal Disqualifying Condition On-the-fly Materialization Empirical Study 24
Q&A n Poster Board n n n Title: Mining Favorable Facets Date: Monday, 13 th August Place: Poster board carrying number 31 25
All points (e. g. , point q) in this region quasi-dominate point p 3. 3 Speedup n n a better value p q 0 a better value Build an R*-tree based on the totallyordered attributes For each point p, n n MDC(p) Perform a range search n n from 0 to the value of dimension D of p for each dimension D For each point q found in the range search n insert Rq p into MDC(p) 26
- Slides: 26