Exploiting Temporal References in Text Retrieval Irem Arikan

  • Slides: 25
Download presentation
Exploiting Temporal References in Text Retrieval Irem Arikan advised by: Srikanta Bedathur, Klaus Berberich

Exploiting Temporal References in Text Retrieval Irem Arikan advised by: Srikanta Bedathur, Klaus Berberich

Motivation q users’ information needs often have a temporal dimension, q but traditional information

Motivation q users’ information needs often have a temporal dimension, q but traditional information retrieval systems do not exploit the temporal content in documents. q query: PM United Kingdom 2000 q search engine is not aware that 2000 is actually mentioned implicitly by the document ! an approach which recognizes and exploits temporal references in documents to yield better search results

Example Temporal Queries Broad Queries • British colony 17 th century • Economic situtation

Example Temporal Queries Broad Queries • British colony 17 th century • Economic situtation Germany 1920 s • President assasination 1950 – 2000 Specific Queries • US president October 1962 • Pope 1940 s • Academy awards best actor 1975 Ambiguous Queries • George Bush 1990 vs. George Bush 2007 • Gulf war 1991 vs. Gulf war 2005

Outline § Language Modeling for Information Retrieval § Time Modeling for Temporal Information Retrieval

Outline § Language Modeling for Information Retrieval § Time Modeling for Temporal Information Retrieval § Combining Text Relevance with Temporal Relevance § Experimental Results

Language Modeling for Information Retrieval Language Model: a statistical model to generate text Language

Language Modeling for Information Retrieval Language Model: a statistical model to generate text Language Modeling: the task of estimating the statistical parameters of a language model Language Modeling for IR: the problem of estimating the likelihood that a query and a document could have been generated by the same language model • In practical IR approaches: Unigram Language Model • words occur independently

Language Modeling for IR 1) document : a sample from a language model •

Language Modeling for IR 1) document : a sample from a language model • assume an underlying multinomial probability distribution over words for each document • estimate statistics of this distribution: P[word] document infer Md : P [ word | Md] 2) estimate the likelihood that the query is generated by this distribution 3) rank the documents by P(q | d )

Temporal Modeling for Temporal Retrieval General approach § similar to LM approach § based

Temporal Modeling for Temporal Retrieval General approach § similar to LM approach § based on a generative model which generates temporal references § § temporal model splits query into 2 parts: text query and temporal query Probabilistic mechanism for producing temporal content of the document § each time reference generated by a different generative temporal model § for generating a time reference 1) first choose a temporal model 2) then generate a time reference using this temporal model

Temporal Modeling Estimating temporal query likelihood § Infer a temporal model from each temporal

Temporal Modeling Estimating temporal query likelihood § Infer a temporal model from each temporal reference in the document § Estimate the likelihood that the temporal query is generated by one of the models which generated the temporal content of the document Temporal query generation probability

Temporal Modeling What is a temporal model? § A probabilistic model to generate temporal

Temporal Modeling What is a temporal model? § A probabilistic model to generate temporal references § What kind of distribution? § How can we estimate its parameters?

Temporal Modeling What is a temporal model? § A probabilistic model to generate temporal

Temporal Modeling What is a temporal model? § A probabilistic model to generate temporal references § What kind of distribution? § How can we estimate its parameters? Formalize the problem in a goal-oriented way, § We should infer a temporal model from each time interval (sample time interval) § This temporal model should be able to generate all time intervals which are relevant to the sample interval

1. Approach l. Overlap Assumptions: r. Overlap • only relevant if they intersect •

1. Approach l. Overlap Assumptions: r. Overlap • only relevant if they intersect • the generative model inferred should be able to produce subintervals, superintervals, overlapping intervals of the interval in the document • probability of generating an intersecting time interval should be proportional to the length of intersection sup 1 sup 2 sub 1 sub 2 t • query: 1980 – 1990 • 1980 – 1989 is more relevant than 23 March 1984 Appropriate probabilistic model: • 2 underlying triangular distributions • one for start, • one for end, s e

Triangular Distribution Parameters Support

Triangular Distribution Parameters Support

1. Approach r 1 u r 2 s r 3 e +1 qs -

1. Approach r 1 u r 2 s r 3 e +1 qs - 1 r 4 e • nonzero probability for intersecting intervals • r 1 – r 3 : left overlaps • r 1 – r 4 : super intervals • r 2 – r 3 : subintervals • r 2 - r 4 : right overlaps • interval [s, e] has the highest probability • probability decreases to the left and right resulting in lower probability for intervals which have smaller intersection lengths l

1. Approach r 1 u r 2 s r 3 e +1 qs -

1. Approach r 1 u r 2 s r 3 e +1 qs - 1 r 4 e l

2. Approach Assumptions: § Only relevant if they are positioned closely to each other

2. Approach Assumptions: § Only relevant if they are positioned closely to each other on the time axis and have similar lengths § | start 1 – start 2 | < a § | length 1 – length 2 | < b § The generative model inferred should be able to produce temporal intervals in some neighbourhood on the time axis ∆l ∆s t l s

2. Approach s -a s s+a l-b l § Temporal interval x = s

2. Approach s -a s s+a l-b l § Temporal interval x = s , y = l has the highest probability § Probability decreases as start point moves away from s and as length moves away from l l+b

2. Approach s -a s s+a l-b l l+b

2. Approach s -a s s+a l-b l l+b

Combining Text Relevance with Temporal Relevance Text relevance

Combining Text Relevance with Temporal Relevance Text relevance

Combining Text Relevance with Temporal Relevance Text relevance Temporal relevance q Filter and re-rank

Combining Text Relevance with Temporal Relevance Text relevance Temporal relevance q Filter and re-rank search results by weighting text relevance score by temporal relevance

System Architecture Information Retrieval (IR) with Temporal Extension Query IR System Index Result Set

System Architecture Information Retrieval (IR) with Temporal Extension Query IR System Index Result Set Temporal Query Temporal Retrieval Result Set Index for temporal references

Experimental Results-1 Query: Spanish painter 18 th century Terrier Boolean Our Method Art_in_Puerto_Rico Agustín_Esteve

Experimental Results-1 Query: Spanish painter 18 th century Terrier Boolean Our Method Art_in_Puerto_Rico Agustín_Esteve José_del_Castillo Spanish_art Acislo_Antonio_Palomino_ de_Castro_y_Velasco Agustín_Esteve Palazzo_Bianco_(Genoa) Alvarez Roybal Caprichos Agostino_Scilla_00 e 6 Maldonado List_of_people_from_Antw erp Bassano Luis_Egidio_Meléndez

Experimental Results-2 Query: Chancellor Germany 1955 Terrier Boolean Our Method Federal_Minister_for_Speci al_Affairs_of_Germany Basic_Law_for_the_Federal _Republic_of_Germany

Experimental Results-2 Query: Chancellor Germany 1955 Terrier Boolean Our Method Federal_Minister_for_Speci al_Affairs_of_Germany Basic_Law_for_the_Federal _Republic_of_Germany Occupation_statute Otto_Gessler Bonn-Paris_conventions Second_German_Bundestag Bonn-Paris_conventions Bavaria_Party West_Germany Occupation_statute All-German_Bloc_League_ of_Expellees_and_Deprived _of_Rights Bonn-Paris_conventions Petersberg_Agreement Anschluss Konrad_Adenauer

Experimental Results-3 Query: George Bush 1990 Terrier Boolean Our Method George_W. _Bush_insider_tr ading_allegations Bush_family

Experimental Results-3 Query: George Bush 1990 Terrier Boolean Our Method George_W. _Bush_insider_tr ading_allegations Bush_family President_Bush_family Bush_administration Early_life_of_George_W. _ Bush Andrew_Card President's Council of Advisors on Science and Technology George_H. _W. _Bush Approval_rating George_H. _W. _Bush C_Boyden_Gray Brent_Scowcroft Arbusto_Energy

Thanks!

Thanks!