6 GIR Motivation Former GIR capturing and handling

Motivation � Former GIR : › capturing and handling geonames and associated feature �ignored

Problem Definition � Rebuilt the query procesing module › all geographic information present on

Objective Generation of geographics signatures for both query (Q ) and documents (D )

New Architecture of Geographic IR Topic titles as query string

Geographic Ontology All modules rely on geographic ontology Using GKB 2. 0 (Geographic Knowledge

Query Processing (1) Geographic query parsing module • with the help of Geo. Ontology

Query Processing (2) Perform : 1. Term Expansion • expand thematic ~ what •

Query Processing (3) Example : CLEF topic #74 Ship traffic in Portuguese island ü

Geographic QE 1. 2. 3. Ship Traffic in Portugal Ship Traffic in island Ship

Geographic QE (2) Scope of the interest : ü All geographic concepts of type

Term Expansion (1) ~ Blind Relevance Feedback ~ Before Relevance Feedback

Term Expansion (2) ~ Blind Relevance Feedback ~ After Relevance Feedback

Text Mining (1) § § parse the document for geoname § generating DSig §

Text Mining (2) Gazzeter : city $ Lisbon: 1 Lisbon city: 1 district $

Sidra 5 : text indexing and ranking module with geographic capabilities based on MG

Geo. Score Spatial Distance Similarity ~ Adj. Sim(s 1, s 2) ~ Spatial Adjacency

Geo. Score (2) Geographic Score ~ Geo. Score(s 1, s 2)

Experiment Result IR GIR MAP Result IR / GIR

Conclusion The best experiment setup is to generate an initial run with classic text

Slides: 26

Download presentation

6 ~ GIR

Motivation � Former GIR : › capturing and handling geonames and associated feature �ignored other terms with important geographic connotation : �spatial relationship (in, near, on the shore of, etc) �feature type (cities, mountains, airports, etc) › there is disambiguation geonames › use a graph-ranking algorithm to analyse the captured feature and assign one single feature as the scope of each document �other partial geographic contexts of the document were ignored �incorrectly assigned scopes often lead to poor results

Problem Definition � Rebuilt the query procesing module › all geographic information present on a query is captured › giving special attention to feature type and spatial relationship, as guides for the geographic query expansion � Using text mining methods to capture and extract disambiguate geonames from text › so that geographic scope can be inferred for each document

Objective Generation of geographics signatures for both query (Q ) and documents (D ) › D is generated for each document by a text mining module › Q is generated through a geographic query expansion module � Geographic query expansion focused on feature, features type and spatial relationship � Geographic ranking improvement � Sig Sig

New Architecture of Geographic IR Topic titles as query string

Geographic Ontology All modules rely on geographic ontology Using GKB 2. 0 (Geographic Knowledge Base) § support relationship between feature and feature type § a better property assignment for feature and feature type § a better control of information source § enrichment in physical domain, with the addition of new feature type airports, circuits, and mountains, along with their instance

Statistic of Geographic Ontology

Query Processing (1) Geographic query parsing module • with the help of Geo. Ontology & manual-crafted context rule • Split into <what, spatial relation, where> • Recognize feature and feature type Example : Ship traffic in portuguese island

Query Processing (2) Perform : 1. Term Expansion • expand thematic ~ what • Blind Relevance Feedback 2. Geographic Expansion • expand the geographic ~ where • based on query type • driven by spatial relationship, feature & feature type

Query Processing (3) Example : CLEF topic #74 Ship traffic in Portuguese island ü Ship traffic : thematic part ~ what ü in : spatial relationship ü Portuguese : feature ~ grounded geoname ü Island : feature type Mapped into the corresponding ontological concept

Geographic QE 1. 2. 3. Ship Traffic in Portugal Ship Traffic in island Ship Traffic in Portuguese island Europe UK Portugal London Lisbon 1 3 Isle of Wight Sao Miguel 2 Isle of Man

Geographic QE (2) Scope of the interest : ü All geographic concepts of type island that are part of Portugal QSig : ü São Miguel, Madeira, Santa Maria, Formigas, Terceira, Graciosa, São Jorge, Pico, Faial, Flores, Corvo, Porto Santo, Desertas and Selvagens

Term Expansion (1) ~ Blind Relevance Feedback ~ Before Relevance Feedback

Term Expansion (2) ~ Blind Relevance Feedback ~ After Relevance Feedback

Text Mining (1) § § parse the document for geoname § generating DSig § Relies on a gazetteer of text pattern generated from the geographic ontology Containing all concept represented by their feature name and respective feature type [<feature type> <feature name>] And [<feature type> $ <feature name>] Example : Lisbon Airport of Lisbon

Text Mining (2) Gazzeter : city $ Lisbon: 1 Lisbon city: 1 district $ Lisbon: 2 Lisbon district: 2 Street $ Lisbon: 3 Lisbon Street: 3 (. . . ) Lisbon: 1, 2, 3, (. . . ) ID LA 072694 -0011: 5668[1. 00]; 2230[0. 33]; 4555[0. 33]; 4556[0. 33]; 4557[0. 33] LA 072694 -0012: 5388[1. 00]; 5389[1. 00]; 5390[1. 00]; 12097[1. 00]; 6653[0. 67] Conf. Meas Normalized into [0, 1]

Sidra 5 : text indexing and ranking module with geographic capabilities based on MG 4 J • Generating Geo and Term Index • Based on QE Term – Query Signature and Geo. Index – Term Index to rank document result

Flow chart of searches in Sidra 5

Geo. Score Spatial Distance Similarity ~ Adj. Sim(s 1, s 2) ~ Spatial Adjacency Similarity ~ Dist. Sim(s 1, s 2) ~ 20% 10% 20% Population Similarity ~ Pop. Sim(s 1, s 2) ~ 50% Geographic Similarity ~ Geo. Sim(s 1, s 2) Geographic Score ~ Geo. Score(s 1, s 2) Ontology Similarity ~ Ont. Sim(s 1, s 2) ~

Geo. Score (2) Geographic Score ~ Geo. Score(s 1, s 2)

Example Computing of Geo. Score

Document Scoring Textual Scoring

Experiment Type

Experiment Result IR GIR MAP Result IR / GIR

Conclusion The best experiment setup is to generate an initial run with classic text retrieval, and use the full geographic ranking modules for the generation of the final run � GIR system is very dependent on the quality of the geographic ontology, and has some limitations in the text mining step �

Terima kasih