INEX 2005 Playground for XMLretrieval Sergey Chernov Why

  • Slides: 21
Download presentation
“INEX 2005: Playground for XML-retrieval” Sergey Chernov

“INEX 2005: Playground for XML-retrieval” Sergey Chernov

Why Do We Need XML Retrieval? * *Slide is taken from Prabhakar Raghavan Sergey

Why Do We Need XML Retrieval? * *Slide is taken from Prabhakar Raghavan Sergey Chernov, Info Lunch at L 3 S 12/20/2021 2

Why Do We Need XML Retrieval? ? * Raghavan *Slide is taken from Prabhakar

Why Do We Need XML Retrieval? ? * Raghavan *Slide is taken from Prabhakar Raghavan Sergey Chernov, Info Lunch at L 3 S 12/20/2021 3

A Scenario for Desktop Search Xuan searches for “the articles about multimedia conferences and

A Scenario for Desktop Search Xuan searches for “the articles about multimedia conferences and workshops, which are titled “call for papers” or “upcoming events” and were recommended by Mounia”. Query: multimedia workshop /title upcoming events /received. From Mounia fn Mounia Lalmas uid: 123 http: //inex. is. infor matik. uniduisburg. de/2005/i ndex. html given Mounia stored. From publication type issn 1070 -986 X year 1998 Queen Mary Uni received. From family Lalmas affiliated. To c: inex 1. 8x mlmu199 8u 40 c 2. xm l accessed. From msgid: 00465 Upcoming Events title published. In text IEEE MULTIMEDIA 1999 Multimedia Computing and Networking 1999 (MMCN 99) … This conference … multimedia systems… Sergey Chernov, Info Lunch at L 3 S 12/20/2021 4

What is INEX? * *Slide is taken from Norbert Fuhr. Sergey Chernov, Info Lunch

What is INEX? * *Slide is taken from Norbert Fuhr. Sergey Chernov, Info Lunch at L 3 S 12/20/2021 5

INEX in the Pictures Norbert Fuhr Saadia Malik Ray Larson Mounia Lalmas Gabriella Kazai

INEX in the Pictures Norbert Fuhr Saadia Malik Ray Larson Mounia Lalmas Gabriella Kazai Patrick Gallinari Arjen P. de Vries Birger Larsen Roelof van Zwol Shlomo Geva Benjamin Piwowarski Paul Ogilvie Andrew Börkur Ludovic Trotman Sigurbjörns Denoyer Sergey Chernov, Info Lunch at L 3 S 12/20/2021 6

INEX in Numbers community: 58 research groups participated in 2005 collection: 17000 IEEE articles

INEX in Numbers community: 58 research groups participated in 2005 collection: 17000 IEEE articles from 1995 -2004, 740 Mb topics (queries): 87 in total, 40 CO+S and 47 CAS topics tracks: 7 (Adhoc, Relevance Feedback, Natural Language Processing, Heterogeneous, Interactive, Document Mining, Multimedia) publications over 4 years: >125 important dates: April – start, November - finish Sergey Chernov, Info Lunch at L 3 S 12/20/2021 7

Adhoc Track: Collection and Queries n IEEE collection (journals and transactions) n Language used

Adhoc Track: Collection and Queries n IEEE collection (journals and transactions) n Language used for structural conditions: NEXI n Topics (queries) Content-only + Structure (CO+S) – Structural part is OPTIONAL Content and Structure (CAS) – Structural part is MANDATORY Example content: "call for papers" conference workshop +multimedia Example structure: //article[about(. //atl, "upcoming events") OR about(. //atl, "call for papers")]//sec[about(. , +multimedia conference workshop)] Target element: //article//sec Support elements: //article[about(. //atl, "upcoming events") ; //article[about(. //atl, "call for papers") //article//sec[about(. , +multimedia conference workshop)] Sergey Chernov, Info Lunch at L 3 S 12/20/2021 8

Adhoc Track: Relevance Assessment Methodology n. Select the top 1500 components in a topic’s

Adhoc Track: Relevance Assessment Methodology n. Select the top 1500 components in a topic’s retrieval results n. Assess w. r. t. two dimensions § Exhaustivity (E), which describes the extent to which the document component discusses the topic. Highly exhaustive Partially exhaustive Too small § Specificity (S), which describes the extent to which the document component focuses on the topic. Sergey Chernov, Info Lunch at L 3 S 12/20/2021 9

Online Relevance Assessment System X-Rai Sergey Chernov, Info Lunch at L 3 S 12/20/2021

Online Relevance Assessment System X-Rai Sergey Chernov, Info Lunch at L 3 S 12/20/2021 10

Adhoc: CO Retrieval Strategies n CO. Focussed : find the most exhaustive and specific

Adhoc: CO Retrieval Strategies n CO. Focussed : find the most exhaustive and specific element in a path. Retrieved elements cannot contain any overlapping elements. n CO. Thorough : find all highly exhaustive and specific elements. Overlapping is considered as an interface and results presentation issue. n CO. Fetch. Browse : first identify relevant articles, and then to identify the most exhaustive and specific elements within the fetched articles. Sergey Chernov, Info Lunch at L 3 S 12/20/2021 11

Adhoc: CAS Retrieval Strategies n VVCAS: structural constraints in both the target elements and

Adhoc: CAS Retrieval Strategies n VVCAS: structural constraints in both the target elements and the support elements are interpreted as vague. n SVCAS : target – strict, support - vague. n VSCAS : target – vague, support - strict. n SSCAS : target and support - strict. Sergey Chernov, Info Lunch at L 3 S 12/20/2021 12

Adhoc: Relevance Values (RV) Sergey Chernov, Info Lunch at L 3 S 12/20/2021 13

Adhoc: Relevance Values (RV) Sergey Chernov, Info Lunch at L 3 S 12/20/2021 13

Adhoc: Metrics n. Consider: § § Two dimensions of relevance Independency assumption does not

Adhoc: Metrics n. Consider: § § Two dimensions of relevance Independency assumption does not hold No predefined retrieval unit Overlap n. Extended Cumulative Gain x. CG and normalised version nx. CG Sergey Chernov, Info Lunch at L 3 S 12/20/2021 14

Adhoc: Competition The n. XCG curves of runs in CO. Thorough task with generalized

Adhoc: Competition The n. XCG curves of runs in CO. Thorough task with generalized quantization Sergey Chernov, Info Lunch at L 3 S 12/20/2021 15

Other Tracks n Relevance Feedback § Collection: IEEE § Goal: investigation of relevance feedback

Other Tracks n Relevance Feedback § Collection: IEEE § Goal: investigation of relevance feedback in the context of XML retrieval. The approach should ideally consider not only content but also the structural features of XML documents. n Interactive § Collection: IEEE § Goal: investigation the behaviour of users when interacting with components of XML documents, and evaluates approaches for XML retrieval which are effective in user-based environments. n Heterogeneous § Collection: Berkeley bib, FIZ Karlsruhe, Duisburg-Essen bib, DBLP, HCI resources, QMUL db, ZDNet § Goal: creation of a heterogeneous test collection, retrieval experiments with a small number of both CO and CAS queries, qualitative analysis of the results. Sergey Chernov, Info Lunch at L 3 S 12/20/2021 16

Other Tracks (continued) n Multimedia § Collection: Lonely Planet document collection § Goal: an

Other Tracks (continued) n Multimedia § Collection: Lonely Planet document collection § Goal: an evaluation platform/forum for structured document retrieval systems that do not only include text in the retrieval process. n Document Mining § Collection: IMd. B collection § Goal: generic tasks of classification and clustering. n Natural Language Processing § Collection: Any § Goal: design and build software that will analyse, understand, and generate results in response to queries that humans express naturally. Sergey Chernov, Info Lunch at L 3 S 12/20/2021 17

A Scenario for Desktop Search Xuan searches for “the articles about multimedia conferences and

A Scenario for Desktop Search Xuan searches for “the articles about multimedia conferences and workshops, which are titled “call for papers” or “upcoming events” and were recommended by Mounia”. Query: multimedia workshop /title upcoming events /received. From Mounia fn Mounia Lalmas uid: 123 http: //inex. is. infor matik. uniduisburg. de/2005/i ndex. html given Mounia stored. From publication type issn 1070 -986 X year 1998 Queen Mary Uni received. From family Lalmas affiliated. To c: inex 1. 8x mlmu199 8u 40 c 2. xm l accessed. From msgid: 00465 Upcoming Events title published. In text IEEE MULTIMEDIA 1999 Multimedia Computing and Networking 1999 (MMCN 99) … This conference … multimedia systems… Sergey Chernov, Info Lunch at L 3 S 12/20/2021 18

Desktop Metadata Missing from INEX • Stored. From - Web links as sources of

Desktop Metadata Missing from INEX • Stored. From - Web links as sources of publications • Received. From - Email activity information, emails containing publications • Email. Annotations - Email annotations (from sender) • Search. Keyword - Search keywords, which were used at Web search engine to find the document • Open. Last, Moved. From - User action history in regard to the publications • Annotation - User annotations Sergey Chernov, Info Lunch at L 3 S 12/20/2021 19

Challenges for Designing a Dataset for Desktop n Data obtained through logging § Pros:

Challenges for Designing a Dataset for Desktop n Data obtained through logging § Pros: real-data § Cons: privacy issues, high level of user cooperation is required, low -scalability n Data created through simulations § Pros: scalable, easy-to-modify, cheap, less restrictions regarding privacy § Cons: can be based on wrong assumptions Sergey Chernov, Info Lunch at L 3 S 12/20/2021 20

Last slide Thanks a lot and Merry Christmas! Sergey Chernov, Info Lunch at L

Last slide Thanks a lot and Merry Christmas! Sergey Chernov, Info Lunch at L 3 S 12/20/2021 21