A Static Rank Framework for LuceneSolr Mike Schultz

  • Slides: 49
Download presentation
A Static Rank Framework for Lucene/Solr Mike Schultz mike. schultz@gmail. com

A Static Rank Framework for Lucene/Solr Mike Schultz mike. schultz@gmail. com

Static Rank for Solr/Lucene • Dynamic Rank • Why Static Rank • Combining Scores

Static Rank for Solr/Lucene • Dynamic Rank • Why Static Rank • Combining Scores • Static Rank Components

Multiple Fields / Multiple Types Pub. Date Is. News Media. Type Text. Body Ø

Multiple Fields / Multiple Types Pub. Date Is. News Media. Type Text. Body Ø Continuous (Date, Int, Float, …)

Multiple Fields / Multiple Types Pub. Date Ø Continuous (Date, Int, Float, …) Is.

Multiple Fields / Multiple Types Pub. Date Ø Continuous (Date, Int, Float, …) Is. News Ø Boolean (True, False) Media. Type Text. Body

Multiple Fields / Multiple Types Pub. Date Ø Continuous (Date, Int, Float, …) Is.

Multiple Fields / Multiple Types Pub. Date Ø Continuous (Date, Int, Float, …) Is. News Ø Boolean (True, False) Media. Type Ø Enum (Book, CD, DVD, Cassette) Text. Body

Multiple Fields / Multiple Types Pub. Date Ø Continuous (Date, Int, Float, …) Is.

Multiple Fields / Multiple Types Pub. Date Ø Continuous (Date, Int, Float, …) Is. News Ø Boolean (True, False) Media. Type Ø Enum (Book, CD, DVD, Cassette) Text. Body Ø Text (Natural Language)

Dynamic Rank Pub. Date Is. News Media. Type Text. Body TF * IDF Query

Dynamic Rank Pub. Date Is. News Media. Type Text. Body TF * IDF Query Dynamic Score

Dynamic Rank Pub. Date • Query Dependent = F(Q, D) Is. News Media. Type

Dynamic Rank Pub. Date • Query Dependent = F(Q, D) Is. News Media. Type Text. Body TF * IDF Query Dynamic Score

Dynamic Rank Pub. Date Is. News • Query Dependent = F(Q, D) • Huge

Dynamic Rank Pub. Date Is. News • Query Dependent = F(Q, D) • Huge dynamic range (0. 001 -1502. 3) Media. Type Text. Body TF * IDF Query Dynamic Score

Dynamic Rank Pub. Date Is. News Media. Type • Query Dependent = F(Q, D)

Dynamic Rank Pub. Date Is. News Media. Type • Query Dependent = F(Q, D) • Huge dynamic range (0. 001 -1502. 3) • Not comparable across queries Text. Body TF * IDF Query Dynamic Score

Dynamic Rank Pub. Date Is. News Media. Type • Query Dependent = F(Q, D)

Dynamic Rank Pub. Date Is. News Media. Type • Query Dependent = F(Q, D) • Huge dynamic range (0. 001 -1502. 3) • Not comparable across queries • Not easily normalized Text. Body TF * IDF Query Dynamic Score

Why Static Rank? Pub. Date Is. News Media. Type Text. Body Query Static Rank

Why Static Rank? Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score

Why Static Rank? Pub. Date Is. News Static Rank System Media. Type Text. Body

Why Static Rank? Pub. Date Is. News Static Rank System Media. Type Text. Body Query Static Score All (dynamic) things equal, I want – Newer over older

Why Static Rank? Pub. Date Is. News Static Rank System Media. Type Text. Body

Why Static Rank? Pub. Date Is. News Static Rank System Media. Type Text. Body Query Static Score All (dynamic) things equal, I want – Newer over older – CD over cassette

Why Static Rank? Pub. Date Is. News Media. Type Text. Body Query Static Rank

Why Static Rank? Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score All (dynamic) things equal, I want – Newer over older – CD over cassette – Arbitrary feature A over arbitrary feature B

Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System

Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score • Query Independent = F(D) – i. e. static across queries

Static Rank Pub. Date Is. News Media. Type Text. Body Static Rank System Static

Static Rank Pub. Date Is. News Media. Type Text. Body Static Rank System Static Score • Query Independent = F(D) – i. e. static across queries • More easily bounded Query

Pub. Date Is. News Static Rank System Custom Query Media. Type Text. Body TF

Pub. Date Is. News Static Rank System Custom Query Media. Type Text. Body TF * IDF Query Combined Score Combined Rank

Framework - Requirements Combined Score Custom Query • Intuitive, hand-tunable, debuggable

Framework - Requirements Combined Score Custom Query • Intuitive, hand-tunable, debuggable

Framework - Requirements Combined Score Custom Query • Intuitive, hand-tunable, debuggable • Query-time only,

Framework - Requirements Combined Score Custom Query • Intuitive, hand-tunable, debuggable • Query-time only, no re-indexing

Combined Score • Intuitive, hand-tunable, debuggable • Query-time only, no re-indexing • Minimal parameters

Combined Score • Intuitive, hand-tunable, debuggable • Query-time only, no re-indexing • Minimal parameters Custom Query Framework - Requirements

Intuitive, hand-tunable, debuggable Query-time only, no re-indexing Minimal parameters Static Rank should boost /

Intuitive, hand-tunable, debuggable Query-time only, no re-indexing Minimal parameters Static Rank should boost / demote – But not too much! – Docs should stay in their own dynamic rank “neighborhood”. Combined Score • • Custom Query Framework - Requirements

Combining Scores - Approaches Combined Score – Dynamic(0. 0001) + Static(0. 3) = 0.

Combining Scores - Approaches Combined Score – Dynamic(0. 0001) + Static(0. 3) = 0. 3001 – Dynamic(1542. 1) + Static(0. 3) = 1542. 4 – Difficult to get right across queries Custom Query • Addition?

Combining Scores - Approaches Combined Score – Dynamic(50. 0) * Static(0. 3) = 15.

Combining Scores - Approaches Combined Score – Dynamic(50. 0) * Static(0. 3) = 15. 0 – Dynamic(10. 0) * Static(2. 0) = 20. 0 – Could work, but awkward Custom Query • Multiplication?

Combining Scores - Approaches Combined Score – At most, static. Rank will boost/demote dynamic.

Combining Scores - Approaches Combined Score – At most, static. Rank will boost/demote dynamic. Score by S% – CScore = 0. 014 * (100+30*0. 5) – CScore = 145. 3 * (100+30*-0. 5) Linear Query 1. Bound Static. Score: -1. 0 to 1. 0 2. CScore = DScore*(100+S%*SScore)

Linear. Query

Linear. Query

Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System

Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score

Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System

Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score • Extend solr. Value. Source/Parser

Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System

Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score • Extend solr. Value. Source/Parser • Uses field cache for inputs

Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System

Static Rank Pub. Date Is. News Media. Type Text. Body Query Static Rank System Static Score • Extend solr. Value. Source/Parser • Uses field cache for inputs • Extremely fast

Static Rank Pub. Date Is. News Media. Type

Static Rank Pub. Date Is. News Media. Type

Static Rank Ago. Value. Source Pub. Date Is. News Media. Type years ago

Static Rank Ago. Value. Source Pub. Date Is. News Media. Type years ago

Static Rank Ago. Value. Source Pub. Date years ago 0 Is. News Media. Type

Static Rank Ago. Value. Source Pub. Date years ago 0 Is. News Media. Type Mux. Value. Source T F years ago

Mux. Value. Source Config

Mux. Value. Source Config

Static Rank Ago. Value. Source Pub. Date years ago 0 Mux. Value. Source T

Static Rank Ago. Value. Source Pub. Date years ago 0 Mux. Value. Source T F years ago Is. News Enum. Value. Source Media. Type

Enum. Value. Source Config • Maps Fixed-Vocabulary to YEARS AGO • A hierarchy and

Enum. Value. Source Config • Maps Fixed-Vocabulary to YEARS AGO • A hierarchy and 3 values: MIN, 0, MAX • All things equal (dynamically), DVD = +3. 3 years

Static Rank Ago. Value. Source Pub. Date years ago 0 Mux. Value. Source T

Static Rank Ago. Value. Source Pub. Date years ago 0 Mux. Value. Source T F years ago Is. News Enum. Value. Source Media. Type Sum. Value. Source years ago ? 1 -1

Mapping Years. Ago to -1. 0 – 1. 0 • Step Function: if >

Mapping Years. Ago to -1. 0 – 1. 0 • Step Function: if > 10 years-ago = -1, else = +1 • 1 parameter • Too abrupt

Mapping Years. Ago to -1. 0 – 1. 0 • Step Function: if >

Mapping Years. Ago to -1. 0 – 1. 0 • Step Function: if > 10 years-ago = -1, else = +1 • 1 parameter • Too abrupt • Linear • No parameters (fixed) • Too gradual over 2000+ years

Mapping Years. Ago to -1. 0 – 1. 0 • Step Function: if >

Mapping Years. Ago to -1. 0 – 1. 0 • Step Function: if > 10 years-ago = -1, else = +1 • 1 parameter • Too abrupt • Linear • No parameters (fixed) • Too gradual over 2000+ years • Sigmoid • 2 parameters • Smooth over entire range • Easy to calculate

Sigmoid Slope

Sigmoid Slope

Sigmoid Slope x-intercept (year)

Sigmoid Slope x-intercept (year)

1. 0 x 0 = 1. 5 years ago Years-ago -1. 0

1. 0 x 0 = 1. 5 years ago Years-ago -1. 0

Static Rank Ago. Value. Source Pub. Date years ago 0 Mux. Value. Source T

Static Rank Ago. Value. Source Pub. Date years ago 0 Mux. Value. Source T F years ago Sum. Value. Source 1 Is. News -1 Enum. Value. Source Media. Type years ago Sigmoid. Value. Source

Sigmoid. Value. Source Config

Sigmoid. Value. Source Config

Static Rank Config

Static Rank Config

Conclusion • solr. Value. Source/Parser - fast and flexible

Conclusion • solr. Value. Source/Parser - fast and flexible

Conclusion • solr. Value. Source/Parser - fast and flexible • CScore = DScore *

Conclusion • solr. Value. Source/Parser - fast and flexible • CScore = DScore * (100 + S% * SScore) • -1. 0 < SScore < 1. 0

Conclusion • solr. Value. Source/Parser - fast and flexible • CScore = DScore *

Conclusion • solr. Value. Source/Parser - fast and flexible • CScore = DScore * (100 + S% * SScore) • -1. 0 < SScore < 1. 0 • “Time” as a common currency for static features