A Static Rank Framework for LuceneSolr Mike Schultz
- Slides: 49
A Static Rank Framework for Lucene/Solr Mike Schultz mike. schultz@gmail. com
Static Rank for Solr/Lucene • Dynamic Rank • Why Static Rank • Combining Scores • Static Rank Components
Multiple Fields / Multiple Types Pub. Date Is. News Media. Type Text. Body Ø Continuous (Date, Int, Float, …)
Multiple Fields / Multiple Types Pub. Date Ø Continuous (Date, Int, Float, …) Is. News Ø Boolean (True, False) Media. Type Text. Body
Multiple Fields / Multiple Types Pub. Date Ø Continuous (Date, Int, Float, …) Is. News Ø Boolean (True, False) Media. Type Ø Enum (Book, CD, DVD, Cassette) Text. Body
Multiple Fields / Multiple Types Pub. Date Ø Continuous (Date, Int, Float, …) Is. News Ø Boolean (True, False) Media. Type Ø Enum (Book, CD, DVD, Cassette) Text. Body Ø Text (Natural Language)
Dynamic Rank Pub. Date Is. News Media. Type Text. Body TF * IDF Query Dynamic Score
Dynamic Rank Pub. Date • Query Dependent = F(Q, D) Is. News Media. Type Text. Body TF * IDF Query Dynamic Score
Dynamic Rank Pub. Date Is. News • Query Dependent = F(Q, D) • Huge dynamic range (0. 001 -1502. 3) Media. Type Text. Body TF * IDF Query Dynamic Score
Dynamic Rank Pub. Date Is. News Media. Type • Query Dependent = F(Q, D) • Huge dynamic range (0. 001 -1502. 3) • Not comparable across queries Text. Body TF * IDF Query Dynamic Score
Dynamic Rank Pub. Date Is. News Media. Type • Query Dependent = F(Q, D) • Huge dynamic range (0. 001 -1502. 3) • Not comparable across queries • Not easily normalized Text. Body TF * IDF Query Dynamic Score
Why Static Rank? Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score
Why Static Rank? Pub. Date Is. News Static Rank System Media. Type Text. Body Query Static Score All (dynamic) things equal, I want – Newer over older
Why Static Rank? Pub. Date Is. News Static Rank System Media. Type Text. Body Query Static Score All (dynamic) things equal, I want – Newer over older – CD over cassette
Why Static Rank? Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score All (dynamic) things equal, I want – Newer over older – CD over cassette – Arbitrary feature A over arbitrary feature B
Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score • Query Independent = F(D) – i. e. static across queries
Static Rank Pub. Date Is. News Media. Type Text. Body Static Rank System Static Score • Query Independent = F(D) – i. e. static across queries • More easily bounded Query
Pub. Date Is. News Static Rank System Custom Query Media. Type Text. Body TF * IDF Query Combined Score Combined Rank
Framework - Requirements Combined Score Custom Query • Intuitive, hand-tunable, debuggable
Framework - Requirements Combined Score Custom Query • Intuitive, hand-tunable, debuggable • Query-time only, no re-indexing
Combined Score • Intuitive, hand-tunable, debuggable • Query-time only, no re-indexing • Minimal parameters Custom Query Framework - Requirements
Intuitive, hand-tunable, debuggable Query-time only, no re-indexing Minimal parameters Static Rank should boost / demote – But not too much! – Docs should stay in their own dynamic rank “neighborhood”. Combined Score • • Custom Query Framework - Requirements
Combining Scores - Approaches Combined Score – Dynamic(0. 0001) + Static(0. 3) = 0. 3001 – Dynamic(1542. 1) + Static(0. 3) = 1542. 4 – Difficult to get right across queries Custom Query • Addition?
Combining Scores - Approaches Combined Score – Dynamic(50. 0) * Static(0. 3) = 15. 0 – Dynamic(10. 0) * Static(2. 0) = 20. 0 – Could work, but awkward Custom Query • Multiplication?
Combining Scores - Approaches Combined Score – At most, static. Rank will boost/demote dynamic. Score by S% – CScore = 0. 014 * (100+30*0. 5) – CScore = 145. 3 * (100+30*-0. 5) Linear Query 1. Bound Static. Score: -1. 0 to 1. 0 2. CScore = DScore*(100+S%*SScore)
Linear. Query
Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score
Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score • Extend solr. Value. Source/Parser
Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score • Extend solr. Value. Source/Parser • Uses field cache for inputs
Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score • Extend solr. Value. Source/Parser • Uses field cache for inputs • Extremely fast
Static Rank Pub. Date Is. News Media. Type
Static Rank Ago. Value. Source Pub. Date Is. News Media. Type years ago
Static Rank Ago. Value. Source Pub. Date years ago 0 Is. News Media. Type Mux. Value. Source T F years ago
Mux. Value. Source Config
Static Rank Ago. Value. Source Pub. Date years ago 0 Mux. Value. Source T F years ago Is. News Enum. Value. Source Media. Type
Enum. Value. Source Config • Maps Fixed-Vocabulary to YEARS AGO • A hierarchy and 3 values: MIN, 0, MAX • All things equal (dynamically), DVD = +3. 3 years
Static Rank Ago. Value. Source Pub. Date years ago 0 Mux. Value. Source T F years ago Is. News Enum. Value. Source Media. Type Sum. Value. Source years ago ? 1 -1
Mapping Years. Ago to -1. 0 – 1. 0 • Step Function: if > 10 years-ago = -1, else = +1 • 1 parameter • Too abrupt
Mapping Years. Ago to -1. 0 – 1. 0 • Step Function: if > 10 years-ago = -1, else = +1 • 1 parameter • Too abrupt • Linear • No parameters (fixed) • Too gradual over 2000+ years
Mapping Years. Ago to -1. 0 – 1. 0 • Step Function: if > 10 years-ago = -1, else = +1 • 1 parameter • Too abrupt • Linear • No parameters (fixed) • Too gradual over 2000+ years • Sigmoid • 2 parameters • Smooth over entire range • Easy to calculate
Sigmoid Slope
Sigmoid Slope x-intercept (year)
1. 0 x 0 = 1. 5 years ago Years-ago -1. 0
Static Rank Ago. Value. Source Pub. Date years ago 0 Mux. Value. Source T F years ago Sum. Value. Source 1 Is. News -1 Enum. Value. Source Media. Type years ago Sigmoid. Value. Source
Sigmoid. Value. Source Config
Static Rank Config
Conclusion • solr. Value. Source/Parser - fast and flexible
Conclusion • solr. Value. Source/Parser - fast and flexible • CScore = DScore * (100 + S% * SScore) • -1. 0 < SScore < 1. 0
Conclusion • solr. Value. Source/Parser - fast and flexible • CScore = DScore * (100 + S% * SScore) • -1. 0 < SScore < 1. 0 • “Time” as a common currency for static features
- Four i's of transformational leadership
- Grete schultz
- Encastillamiento placentario
- Tipos de placenta schultz y duncan
- Avimanyu datta
- Howard schultz transactional leadership
- Howard schultz leadership qualities
- Wesley homeshare
- Overtake
- Rådcenter
- Jenny jones killer
- Cornelia schultz
- Adrian schultz
- Per schultz jørgensen risikomodel
- Etienne schultz
- Is howard schultz a good leader
- First starbucks logo
- Forrest schultz
- Howard schultz
- Defination of hypersensitivity
- Dirty duncan and shiny schultz
- Dr edward kelly
- Scm starbucks
- Devin schultz
- Borstål, egenskaper
- Orubbliga rättigheter
- Verktyg för automatisering av utbetalningar
- Big brother rösta
- Plats för toran ark
- Omprov cellprov
- Jag har nigit för nymånens skära
- Ro i rom pax
- Tack för att ni lyssnade bild
- Strategi för svensk viltförvaltning
- Ledningssystem för verksamhetsinformation
- Typiska drag för en novell
- Tack för att ni har lyssnat
- Shivaiter
- Cks
- Läkarutlåtande för livränta
- Påbyggnader för flakfordon
- Inköpsprocessen steg för steg
- A gastrica
- Egg för emanuel
- En lathund för arbete med kontinuitetshantering
- Varians
- Rutin för avvikelsehantering
- Klassificeringsstruktur för kommunala verksamheter
- Myndigheten för delaktighet
- Presentera för publik crossboss