Rankingbased Processing of SQL Queries Date 2012116 Source
Ranking-based Processing of SQL Queries Date: 2012/1/16 Source: Hany Azzam (CIKM’ 11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh
Outline ß ß ß Introduction The Core Retrieval Models Þ TF-IDF Þ LM Model Tuple Retrieval Algorithm SQL-to-PSQL Þ Basic Views Þ TF-IDF-based Processing of SQL Queries Þ LM-based Processing of SQL Queries Experiment Conclusion 2
Introduction ß ß Motivation: Ø Support document/context and tuple retrieval Ø “Seamlessly” integrated IR+DB technology Goal: Þ Using IR models for processing SQL queries and develops the application of PSQL for tuple retrieval. 3
Introduction Properties Typical SQL Query Decompos e Area Price Type LA 210 Flat Texas 230 Studio Florida 260 Flat LA 225 Room are. Index Part Area Type LA Flat Texas Studio LA Room Area Retrieval Part LA Texas 4 Area LA Texas
Introduction Bayes 5
TF-IDF RSV ß ß ND(c) : number of Documents in collection “c” n. D(t, c) : number of Documents with term “t" in collection “c”, ß dft : n. D(t, c) is the document frequency. ß NL(c) : number of Locations in collection “c” ß n. L(t, c) : number of Locations with term “t". ß NL(d) and n. L(t, d) : Location-based counts for document “d”, ß tfd : =n. L(t, d) 6 c d 1 t 1, t 2 d 2 t 1, t 2 d 3 t 1, t 3 d 4 t 2
TF-IDF RSV ß TF-IDF term weight Þ weight is defined as follows: d 1 t 1, t 2 d 2 t 1, t 2 d 3 t 1, t 3 d 4 t 2 Q = t 1 , t 2 7
LM RSV c ß 8 d 1 t 1, t 2 d 2 t 1, t 2 d 3 t 1, t 3 d 4 t 2
LM RSV ß Language modelling (LM) The LM term weight is defined as follows: Þ c d 1 t 1, t 2 d 2 t 1, t 2 d 3 t 1, t 3 d 4 t 2 Q = t 1 , t 2 9
Tuple Retrieval 10
Tuple Retrieval Query. Id Doc. Id q 1 Doc 1 q 1 Doc 2 q 1 Doc 3 q 1 Doc 4 11 Doc 4
SQL 2 PSQL ALGORITHM Basic Views ß Tuple-based (Location-based) Probabilities, P_Z(X) 12
SQL 2 PSQL ALGORITHM Basic Views ß Conditional Probabilities, Pz(X|Y) 13
SQL 2 PSQL ALGORITHM Basic Views ß Value-based (Document-based) Probabilities Pz[x](X|Y) 14
SQL 2 PSQL ALGORITHM Basic Views ß Information-based Probabilities Pz(X infors) 15
TF-IDF-based Processing of SQL Queries 16
TF-IDF-based Processing of SQL Queries 17 0. 069 = 0. 5*0. 1386 sailing doc 1 0. 189 = 0. 5*0. 3174 boats doc 1 0. 091= 0. 66*0. 1386 sailing doc 2 0. 105 = 0. 33*0. 3174 boats doc 2 0. 046 = 0. 33*0. 1386 sailing doc 3 0. 33 = 0. 33*1 east doc 3 0. 33 = 0. 33*1 coast doc 3 0. 139 = 1. 0*0. 1386 sailing doc 4 0. 317 = 1. 0*0. 3174 boats doc 5
TF-IDF-based Processing of SQL Queries 0. 069 = 0. 5*0. 1386 sailing doc 1 0. 189 = 0. 5*0. 3174 boats doc 1 0. 091= 0. 66*0. 1386 sailing doc 2 0. 105 = 0. 33*0. 3174 boats doc 2 0. 046 = 0. 33*0. 1386 sailing doc 3 0. 33 = 0. 33*1 east doc 3 0. 33 = 0. 33*1 coast doc 3 0. 139 = 1. 0*0. 1386 sailing doc 4 0. 317 = 1. 0*0. 3174 boats doc 5 value 1 = saling , value 2 = east 18 0. 069 Doc 1 0. 091 Doc 2 0. 376=0. 046+0. 33 0. 139 Doc 3 Doc 4
LM-based Processing of SQL Queries 19 Log(1+1) = Log[ 1+ (0. 5/0. 5 ) ] Log(1+1. 66 ) = Log[ 1+ ( 0. 5/0. 3 ) ] sailing boats doc 1 Log(1+1. 32) = Log[ 1+ (0. 66/0. 5 ) ] Log(1+1. 1 ) = Log[ 1+( 0. 33/0. 3 ) ] Log(1+0. 66 ) = Log[ 1+ (0. 33/0. 5 ) ] Log(1+3. 3 ) = Log[ 1+ (0. 33/0. 1 ) ] sailing doc 2 boats sailing doc 2 doc 3 east doc 3 Log(1+3. 3 ) = Log[ 1+ (0. 33/0. 1 ) ] coast doc 3 Log(1+2 ) = Log[ 1+ (1. 0/0. 5 ) ] sailing doc 4 Log(1+3. 33) = Log[ 1+ (1. 0/0. 3) ] boats doc 5
LM-based Processing of SQL Queries Log(1+1) = Log[ 1+ (0. 5/0. 5 ) ] sailing doc 1 Log(1+1. 66 ) = Log[ 1+ ( 0. 5/0. 3 ) boats ] Log(1+1. 32) = Log[ 1+ (0. 66/0. 5 sailing )] Log(1+1. 1 ) = Log[ 1+( 0. 33/0. 3 ) boats ] Log(1+0. 66 ) = Log[ 1+ (0. 33/0. 5 sailing )] Log(1+3. 3 ) = Log[ 1+ (0. 33/0. 1 ) east ] Log(1+3. 3 ) = Log[ 1+ (0. 33/0. 1 ) coast ] Log(1+2 ) = Log[ 1+ (1. 0/0. 5 ) ] sailing doc 1 Log(1+3. 33) = Log[ 1+ (1. 0/0. 3) ] boats doc 2 doc 3 doc 4 20 doc 5 value 1 = saling , value 2 = east 0. 25 Doc 1 0. 33 Doc 2 0. 005 =0. 165 * 0. 033 0. 5 Doc 3 Doc 4
Experiment ß The aim is to investigate the implementation of the retrieval models by examining how much quality could be achieved and at what cost. 21
Experiment - Evaluation ß MAP(Mean Average Precision) Topic 1 : There are 4 relative page‧ rank : 1, 2, 4, 7 Topic 2 : There are 5 relative page‧ rank : 1, 3, 5, 7, 10 Topic 1 Average Precision : (1/1+2/2+3/4+4/7)/4=0. 83。 Topic 2 Average Precision : (1/1+2/3+3/5+4/7+5/10)/5=0. 45。 MAP= (0. 83+0. 45)/2=0. 64。 ß Reciprocal Rank Topic 1 Reciprocal Rank : (1+1/2+1/4+1/7)/4=0. 83。 Topic 2 Reciprocal Rank : (1+1/3+1/5+1/7+1/10)/5=0. 45。 22
Experiment 23
Experiment 24
Conclusion ß Support the high-level (abstract) modelling of general and specific retrieval tasks (ad-hoc retrieval, classification, summarisation, structured document retrieval, hypertext retrieval, multimedia retrieval, . . . ) 25
- Slides: 25