Ranking in DB Laks V S Lakshmanan Depf
Ranking in DB Laks V. S. Lakshmanan Depf. of CS UBC
Why ranking in query answering? 1/3 • Mutimedia data – fuzzy querying: e. g. , “find top 2 red objects with a soft texture”. Obj A D C B E 10/22/2021 Score 0. 9 0. 8 0. 4 0. 3 0. 1 Overall score Combine scores Obj D B A E C Score 0. 85 0. 80 0. 75 0. 60 2
Why ranking? 2/3 • IR: “find top 5 documents relevant to `computational’, `neuroscience’ and `brain theory’. – IR systems maintain full text indexes; inverted lists of docs w. r. t. each keyword. – Same Q/A paradigm as before. • Buying a home: several criteria – price, location, area, #BRs, school district. ORDER BY query in SQL. • Finding hotels while traveling. 10/22/2021 3
Why ranking? 3/3 • Data stream, e. g. , of network flow data: “find 10 users with the max. BW consumption and max. #packets communicated”. – score may be complex aggregation of these two measures. • In a social net, find 5 items tagged as most relevant to “lawn mowing” and blonging to users socially close to the seeker. • And now, find top-k recs (recommender systems). • etc. • Fagin et al. – pioneering papers PODS’ 96, 01, JCSS 2003. Burgeoned into a field now. • Focus on middleware algorithm, which given a score combo. function, computes top-k answers by probing diff. subsystems (or ranked lists). 10/22/2021 4
Computational model • Naïve method. • How to compute top-K efficiently? • Access methods: – Sorted access (sequential access) [SA]. – Random access [RA]. • Diff. optimization metrics: – – Overall running time of algorithm. SA < RA: minimize RAs. RA not possible #: avoid RAs. Combined optimization. • Has led to a variety of algorithms. • Memory vs. disk model. • For the most part, assume score agg. is a monotone function; use SUM in examples. 10/22/2021 #: typical in IR systems. 5
Fagin’s Algorithm (FA) • m lists sorted by descending scores. • Access (SA) all lists in parallel. – For each new object seen, fetch scores from other lists by RA. Overall score t(x) = t(x 1, …, xm). Store (obj, score) in set Y. – Remember each object seen (under SA) in all lists in set H. • Repeat until |H| >= K. • Sort Y in descending order of scores, breaking ties arbitrarily, and output top K. 10/22/2021 6
Example of FA L 1 H(0. 95) L 2 L 3 L 4 Answers seen in >=1 list, i. e. , Y unsorted. A J(1. 00) C(0. 95) E(1. 00) B B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) C(0. 70) H D(0. 65) F(0. 60) I(0. 50) A(0. 65) I A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) J(0. 30) D E F G J Answers seen (under SA) in all 4 lists, i. e. , H. 7
Example of FA L 1 H(0. 95) L 2 L 3 L 4 Answers seen in >=1 list, i. e. , Y unsorted. A J(1. 00) C(0. 95) E(1. 00) B B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) C(0. 70) H D(0. 65) F(0. 60) I(0. 50) A(0. 65) I A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) J(0. 30) D E F G J Answers seen (under SA) in all 4 lists, i. e. , H. 8
Example of FA L 1 H(0. 95) L 2 L 3 L 4 Answers seen in >=1 list, i. e. , Y unsorted. A J(1. 00) C(0. 95) E(1. 00) B B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) C(0. 70) H D(0. 65) F(0. 60) I(0. 50) A(0. 65) I A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) J(0. 30) D E F G 3. 30 J Answers seen (under SA) in all 4 lists, i. e. , H. 9
Example of FA L 1 H(0. 95) L 2 L 3 L 4 Answers seen in >=1 list, i. e. , Y unsorted. A J(1. 00) C(0. 95) E(1. 00) B B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) C(0. 70) H D(0. 65) F(0. 60) I(0. 50) A(0. 65) I A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) J(0. 30) D E F G J 3. 30 2. 65 Answers seen (under SA) in all 4 lists, i. e. , H. 10
Example of FA L 1 H(0. 95) L 2 L 3 L 4 Answers seen in >=1 list, i. e. , Y unsorted. A J(1. 00) C(0. 95) E(1. 00) B B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) C(0. 70) H D(0. 65) F(0. 60) I(0. 50) A(0. 65) I A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) J(0. 30) 3. 40 D E 3. 05 F G J 3. 30 2. 65 Answers seen (under SA) in all 4 lists, i. e. , H. 11
Example of FA L 1 H(0. 95) L 2 L 3 L 4 Answers seen in >=1 list, i. e. , Y unsorted. A J(1. 00) C(0. 95) E(1. 00) B 3. 05 B(0. 90) C(0. 95) J(0. 80) G(0. 95) C 3. 40 E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) D E 3. 05 F G 3. 15 C(0. 70) H 3. 30 A(0. 65) I J(0. 30) J 2. 65 Answers seen (under SA) in all 4 lists, i. e. , H. 12
Example of FA L 1 H(0. 95) L 2 L 3 L 4 Answers seen in >=1 list, i. e. , Y unsorted. A J(1. 00) C(0. 95) E(1. 00) B 3. 05 B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D 3. 40 2. 55 C(0. 80 H(0. 80) E 3. 05 G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) F G 3. 15 C(0. 70) H 3. 30 A(0. 65) I J(0. 30) J 2. 65 Answers seen (under SA) in all 4 lists, i. e. , H. 13
Example of FA L 1 H(0. 95) L 2 L 3 Answers seen in >=1 list, i. e. , Y unsorted. L 4 A J(1. 00) C(0. 95) E(1. 00) B 3. 05 B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D 3. 40 2. 55 C(0. 80 H(0. 80) E 3. 05 G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) F D(0. 80) G 3. 15 C(0. 70) H 3. 30 A(0. 65) I J(0. 30) J 2. 65 Answers seen (under SA) in all 4 lists, i. e. , H. H 14
Example of FA L 1 H(0. 95) L 2 L 3 Answers seen in >=1 list, i. e. , Y unsorted. L 4 A J(1. 00) C(0. 95) E(1. 00) B 3. 05 B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D 3. 40 2. 55 C(0. 80 H(0. 80) E 3. 05 G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) F D(0. 80) G 3. 15 C(0. 70) H 3. 30 A(0. 65) I J(0. 30) J 2. 65 Answers seen (under SA) in all 4 lists, i. e. , H. H, G 15
Example of FA L 1 H(0. 95) L 2 L 3 L 4 Answers seen in >=1 list, i. e. , Y unsorted. A J(1. 00) C(0. 95) E(1. 00) B 3. 05 B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D 3. 40 2. 55 C(0. 80 H(0. 80) E 3. 05 G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) F G 3. 15 C(0. 70) H 3. 30 A(0. 65) I 2. 05 J 2. 65 J(0. 30) Answers seen (under SA) in all 4 lists, i. e. , H. H, G, B, C |H| = 4. 16
FA Example concluded • A, F – not seen in any list. Yet, we are sure they can’t make it to top-4. Why? • Based on where the cursors are now, what’s the max. possible score for A, F? • What assumptions are being made about t()? • FA is shown to be optimal with very high probability [Fagin: PODS 1996]. • But can be beaten by other algorithms on specific inputs. • What about buffer size? 10/22/2021 17
Threshold Algorithm • Do parallel SA on all m lists. • For each object x seen under SA in a list, fetch its scores from other lists by RA and compute overall score. • If |Buffer| < K add x to Buffer; • Else if score(x) <= k-th score in buffer, toss; • Else replace bottom of buffer with (x, score(x)) & resort. • Stop when threshold <= k-th score in buffer. • Threshold : = t(worst score seen on L 1, …, worst score seen on Lm). • Output the top-K objects & scores (in buffer). 10/22/2021 18
TA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) C(0. 70) H D(0. 65) F(0. 60) I(0. 50) A(0. 65) I A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) D E F G J J(0. 30) 19
TA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) C(0. 70) H D(0. 65) F(0. 60) I(0. 50) A(0. 65) I A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) D E F G J J(0. 30) 20
TA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) C(0. 70) H D(0. 65) F(0. 60) I(0. 50) A(0. 65) I A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) Threshold Bar: F(0. 50) I(0. 30) J(0. 30) x 1 x 2 x 3 x 4 0. 95 1. 00 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) D E F G 3. 30 J 21
TA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) C(0. 70) H D(0. 65) F(0. 60) I(0. 50) A(0. 65) I A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D(0. 80) J(0. 30) 3. 40 D E 3. 05 F G J 3. 30 2. 65 Threshold Bar: T = 3. 90. x 1 x 2 x 3 x 4 0. 95 1. 00 22
TA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B 3. 05 X B(0. 90) C(0. 95) J(0. 80) G(0. 95) C 3. 40 E(0. 85) G(0. 85) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) D E 3. 05 F G 3. 15 C(0. 70) H 3. 30 A(0. 65) I D(0. 80) J(0. 30) J 2. 65 X Threshold Bar: T=3. 60. x 1 x 2 x 3 x 4 0. 90 0. 95 0. 80 0. 95 23
TA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B 3. 05 X B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D 3. 40 2. 55 X C(0. 80 H(0. 80) E 3. 05 G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) F G 3. 15 C(0. 70) H 3. 30 A(0. 65) I D(0. 80) J(0. 30) J 2. 65 X Threshold Bar: T=3. 30. x 1 x 2 x 3 x 4 0. 85 0. 70 0. 90 24
TA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B 3. 05 X B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D 3. 40 2. 55 X C(0. 80 H(0. 80) E 3. 05 G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) F G 3. 15 C(0. 70) H 3. 30 A(0. 65) I D(0. 80) J(0. 30) J 2. 65 X Threshold Bar: T=3. 10. x 1 x 2 x 3 x 4 0. 80 0. 65 0. 85 25
TA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B 3. 05 X B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D 3. 40 2. 55 X C(0. 80 H(0. 80) E 3. 05 G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) I(0. 30) 10/22/2021 D(0. 70) H(0. 65) G(0. 60) A(0. 30) H(0. 90) B(0. 85) F G 3. 15 C(0. 70) H 3. 30 A(0. 65) I D(0. 80) J(0. 30) J 2. 65 X Threshold Bar: T=2. 90. ==> can stop! x 1 x 2 x 3 x 4 0. 75 0. 60 0. 80 26
TA Remarks • 10/22/2021 27
TA is Instance Optimal • 10/22/2021 28
TA IO Proof (contd. ) • 10/22/2021 29
Proof (contd. ) • 10/22/2021 30
Proof (contd. ) • 10/22/2021 31
Proof (contd. ) • 10/22/2021 32
Proof (concluded) • 10/22/2021 33
No Random Access Algorithm • What if RA > SA or RA wasn’t allowed? • Do SA on all lists in parallel. At depth d: – Maintain worst scores x 1, …, xm. – x any object seen in lists {1, …, i}. • Best(x) = t(x 1, …, xi+1, …, xm). • Worst(x) = t(x 1, …, xi, 0, …, 0). – Top. K contains K objects with max worst scores at depth d. Break ties using Best. M = k-th Worst score in Top. K. – Object y is viable if Best(y) > M. • Stop when Top. K contains >=K distinct objects and no object outside Top. K is viable. Return Top. K. 10/22/2021 34
NRA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D(0. 70) H(0. 90) D E C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) C(0. 70) H D(0. 65) F(0. 60) I(0. 50) A(0. 65) I A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) 10/22/2021 I(0. 30) H(0. 65) G(0. 60) A(0. 30) B(0. 85) D(0. 80) J(0. 30) [0. 95, 3. 90] [1. 00, 3. 90] F G J [0. 95, 3. 90] [1. 00, 3. 90] 35
NRA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B [0. 90, 3. 60] B(0. 90) C(0. 95) J(0. 80) G(0. 95) C [1. 90, 3. 75] E(0. 85) G(0. 85) D(0. 70) H(0. 90) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) 10/22/2021 I(0. 30) H(0. 65) G(0. 60) A(0. 30) B(0. 85) D E [1. 00, 3. 65] F G [0. 95, 3. 60] C(0. 70) H [0. 95, 3. 65] A(0. 65) I D(0. 80) J(0. 30) J [1. 80, 3. 65] 36
NRA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B [0. 90, 3. 35] B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D [1. 90, 3. 65] [0. 70, 3. 30] E [1. 85, 3. 40] D(0. 70) H(0. 90) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) 10/22/2021 I(0. 30) H(0. 65) G(0. 60) A(0. 30) B(0. 85) F G [1. 80, 3. 35] C(0. 70) H [1. 85, 3. 40] A(0. 65) I D(0. 80) J(0. 30) J [1. 80, 3. 55] 37
NRA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B [1. 75, 3. 20] B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D [2. 70, 3. 55] [0. 70, 3. 15] E [1. 85, 3. 30] D(0. 70) H(0. 90) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) 10/22/2021 I(0. 30) H(0. 65) G(0. 60) A(0. 30) B(0. 85) F G [1. 80, 3. 25] C(0. 70) H [3. 30, 3. 30] A(0. 65) I D(0. 80) J(0. 30) J [1. 80, 3. 45] 38
NRA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B [1. 75, 3. 10] B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D [2. 70, 3. 50] [1. 50, 3. 00] E [2. 60, 3. 20] D(0. 70) H(0. 90) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) 10/22/2021 I(0. 30) H(0. 65) G(0. 60) A(0. 30) B(0. 85) F G [3. 15, 3. 15] C(0. 70) H [3. 30, 3. 30] A(0. 65) I D(0. 80) J(0. 30) J [1. 80, 3. 35] 39
NRA Example L 1 H(0. 95) L 2 L 3 L 4 A J(1. 00) C(0. 95) E(1. 00) B [3. 05, 3. 05] B(0. 90) C(0. 95) J(0. 80) G(0. 95) C E(0. 85) G(0. 85) D [3. 40, 3. 40] [1. 50, 2. 95] E [2. 60, 3. 15] D(0. 70) H(0. 90) C(0. 80 H(0. 80) G(0. 75) E(0. 75) I(0. 70) B(0. 75) B(0. 55) D(0. 65) F(0. 60) I(0. 50) A(0. 60) A(0. 50) E(0. 45) I(0. 55) J(0. 55) D(0. 40) F(0. 45) F(0. 50) 10/22/2021 I(0. 30) H(0. 65) G(0. 60) A(0. 30) B(0. 85) F G [3. 15, 3. 15] C(0. 70) H [3. 30, 3. 30] A(0. 65) I [0. 70, 2. 70] D(0. 80) J(0. 30) J [1. 80, 3. 20] 40
NRA Features • What sort of t() do we need to assume, for NRA to work correctly? • How large can the buffers get? • How does the amount of bookkeeping compare with TA? • NRA is instance optimal over algo’s not making RA (and of course, not making wild guesses). 10/22/2021 41
Combined optimization • What if we are told cost(RA) = . cost(SA)? • Can we find algo’s better than NRA and TA in this case? • Combined algorithm = CA. (See Fagin et al. ’s paper for details. ) 10/22/2021 42
Worrying about I/O cost • Based on Bast et al. VLDB 2006. • Inverted lists of (item. ID, score) entries in desc. score order, as usual, but on disk. • Blocks sorted by item. ID; across blocks still in desc. score order. • Inverted Block Index (IBI) Algorithm. • What is an IBI? 10/22/2021 43
A Motivating Example List 1 Doc 17 : 0. 8 Doc 78 : 0. 2. · · List 2 Doc 25 : 0. 7 Doc 38 : 0. 5 Doc 14 : 0. 5 Doc 83 : 0. 5 · Doc 17 : 0. 2 · List 3 Doc 83 : 0. 9 Doc 17 : 0. 7 Doc 61 : 0. 3 · · Round 1 (SA on 1, 2, 3) Doc 17 : [0. 8 , 2. 4] Doc 25 : [0. 7 , 2. 4] Doc 83 : [0. 9 , 2. 4] unseen: ≤ 2. 4 10/22/2021 44
A Motivating Example List 1 Doc 17 : 0. 8 Doc 78 : 0. 2. · · List 2 Doc 25 : 0. 7 Doc 38 : 0. 5 Doc 14 : 0. 5 Doc 83 : 0. 5 · Doc 17 : 0. 2 · Round 1 (SA on 1, 2, 3) Doc 17 : [0. 8 , 2. 4] Doc 25 : [0. 7 , 2. 4] Doc 83 : [0. 9 , 2. 4] unseen: ≤ 2. 4 10/22/2021 List 3 Doc 83 : 0. 9 Doc 17 : 0. 7 Doc 61 : 0. 3 · · Round 2 (SA on 1, 2, 3) Doc 17 : [1. 5 , 2. 0] Doc 25 : [0. 7 , 1. 6] Doc 83 : [0. 9 , 1. 6] unseen: ≤ 1. 4 45
A Motivating Example List 1 Doc 17 : 0. 8 Doc 78 : 0. 2. · · List 2 Doc 25 : 0. 7 Doc 38 : 0. 5 Doc 14 : 0. 5 Doc 83 : 0. 5 · Doc 17 : 0. 2 · List 3 Doc 83 : 0. 9 Doc 17 : 0. 7 Doc 61 : 0. 3 · · Round 1 (SA on 1, 2, 3) Round 2 (SA on 1, 2, 3) Round 3 (SA on 2, 2, 3!) Doc 17 : [1. 5 , 2. 0] Doc 17 : [0. 8 , 2. 4] Doc 25 : [0. 7 , 1. 6] Doc 83 : [1. 4 , 1. 6] Doc 25 : [0. 7 , 2. 4] Doc 83 : [0. 9 , 1. 6] unseen: ≤ 1. 0 Doc 83 : [0. 9 , 2. 4] unseen: ≤ 1. 4 unseen: ≤ 2. 4 10/22/2021 46
A Motivating Example List 1 Doc 17 : 0. 8 Doc 78 : 0. 2. · · Round 1 (SA on 1, 2, 3) Doc 17 : [0. 8 , 2. 4] Doc 25 : [0. 7 , 2. 4] Doc 83 : [0. 9 , 2. 4] unseen: ≤ 2. 4 10/22/2021 List 2 Doc 25 : 0. 7 Doc 38 : 0. 5 Doc 14 : 0. 5 Doc 83 : 0. 5 · Doc 17 : 0. 2 · Round 2 (SA on 1, 2, 3) Doc 17 : [1. 5 , 2. 0] Doc 25 : [0. 7 , 1. 6] Doc 83 : [0. 9 , 1. 6] unseen: ≤ 1. 4 List 3 Doc 83 : 0. 9 Doc 17 : 0. 7 Doc 61 : 0. 3 · · Round 3 (SA on 2, 2, 3!) Doc 17 : [1. 5 , 2. 0] Doc 83 : [1. 4 , 1. 6] unseen: ≤ 1. 0 Round 4 (RA for Doc 17) Doc 17 : 1. 7 all others < 1. 7 done! Note deviation from round-robin. 47
IBI Algorithm • Same setting as NRA/CA, except use IBI. • Maintain two lists: Top-K items (T = d 1, …, dk) and Still. Have. ASHot (SHASH) (S = dk+1, …, dk+q) items. • Pos_i = curr cursor position on list Li. • high_i = score in Li at curr cursor position (upper bounds score of unseen items). • For items d in S: – Which attr scores are known E(d). – Which attr scores are unknown E~(d). – Worst(d) = total score from E(d). – Best(d) = Worst(d) + {high_i(d) | i E~(d)}. (Exactly as Fagin. ) 10/22/2021 48
IBI Algorithm (contd. ) • In each round, compute: – min-k = min{Worst(d) | d T}. – bestscore that any unseen doc can have = sum of all high_i’s. – For dj S: def_j = min-k – worst(d_j). [denotes deficit below qualification level for top-k. ] • T sorted in desc. Worst(); S sorted in desc. Best(). [sorting on (score, Item. ID) for fast processing. ] • Invatiant: min-k >= max{Worst(d) | d S}. • Termination: when min-k >= max{Best(d) | d S}. • Can remove an obj from S whenever its Best <= min-k. stop when S = {}. • Early termination AND minimal bookkeeping are BOTH important for performance. 10/22/2021 49
More on IBI Framework • Instead of scheduling SAs using RR, use a differential approach for diff. lists based on expected score reductions at future cursor positions (Knapsack). • Do SA*RA*. • Order RAs based on estimated Prob[dj can get into top-k answers]. 10/22/2021 50
- Slides: 50