Whats happening at NTCIR Noriko Kando National Institute

  • Slides: 72
Download presentation
What’s happening at NTCIR Noriko Kando National Institute of Informatics http: //research. nii. ac.

What’s happening at NTCIR Noriko Kando National Institute of Informatics http: //research. nii. ac. jp/ntcir/ kando (at) nii. ac. jp ntcir 5 -clef 2005 -09 -22 Noriko kando

NTCIR Workshop is : A series of evaluation workshops designed to enhance research in

NTCIR Workshop is : A series of evaluation workshops designed to enhance research in information access technologies by providing infrastructure of large-scale evaluation. Project started late 1997, Once per 1½ years 1 st : Nov. 1, 1998 - Sept. 1, 1999 2 nd : June, 2000– March, 2001 3 rd : Sept 2001 - Oct 2002 4 th: Apr 2003 – June 2004 5 th: Oct 2004 – Dec 2005 * Nii Test Collection for Information Retrieval systems * Co-sponsored by NII and MEXT Grant-in-Aid on Informatics ntcir 5 -clef 2005 -09 -22 Noriko kando

Focus of NTCIR New Challenges Lab-type IR Test Asian Languages/cross-language Variety of Genre Parallel/comparable

Focus of NTCIR New Challenges Lab-type IR Test Asian Languages/cross-language Variety of Genre Parallel/comparable Corpus Intersection of IR + NLP To make information in the documents more usable for users! Realistic eval/user task Forum for Researchers Idea Exchange Discussion/Investigation on Evaluation methods/metrics ntcir 5 -clef 2005 -09 -22 Noriko kando

Tasks (Research Areas) of NTCIR Workshops Project started late 1997 1 st Japanese IR

Tasks (Research Areas) of NTCIR Workshops Project started late 1997 1 st Japanese IR T a s k s 2 nd 3 rd 4 th 5 th Cross-lingual IR Patent Retrieval Web Retrieval Term Extraction/ Role Analysis Question. Answering Cross-Language Question. Answering  Text Summarization [Pilot] Trend Info ntcir 5 -clef 2005 -09 -22 Nov 98 Sept. 2004 About once per Noriko kando 1 ½ years 4

NTCIR-5 (Mtg: Dec. 6 -9, 2005) • CLIR: focus: NE, OOV, news docs 2000

NTCIR-5 (Mtg: Dec. 6 -9, 2005) • CLIR: focus: NE, OOV, news docs 2000 -2001 CJK You are • CLQA: E-C, C-C, E-J (Pilot, New) most • Patent Retrieval: welcome! – Invalidity Search, 10 yr patent fulltext ca 90 GB – Text Categorization to F-terms (good granularity for patent map axis) • QAC: Series of Questions (J-J) • WEB: Navigational Retrieval, New 1. 5 TB docs • [Pilot] Must: MUltimodal Summarization for Trend information, extract numeric information from a set of documents, and visualize them to show their trends ntcir 5 -clef 2005 -09 -22 Noriko kando

Schedule for NTCIR-5 [TASK] Dec 2004: Document Release April-July, 2005: Formal Run 1 Sept

Schedule for NTCIR-5 [TASK] Dec 2004: Document Release April-July, 2005: Formal Run 1 Sept 2005: Evaluation Results Return (excpt CLIR) 15 Oct 2005: Paper Submission 6 -9 Dec 2005: Conference, at NII, Tokyo Japan *Proceedings will be published at the Conference. [Open Submission] 1 Oct 2005: Paper Due 1 Nov 2005: Late Breaking Short paper Due 15 Nov 2005: Notification ntcir 5 -clef 2005 -09 -22 Noriko kando

NTCIR workshop: Number of Participating Groups 102 groups from 15 countries, registered 102 12

NTCIR workshop: Number of Participating Groups 102 groups from 15 countries, registered 102 12 74 10 65 9 36 8 6 ntcir 5 -clef 2005 -09 -22 28 Noriko kando

Number of Participants by Tasks Registered for NTCIR-5 Chinese J E, E J、 E

Number of Participants by Tasks Registered for NTCIR-5 Chinese J E, E J、 E C x CJEK ntcir 5 -clef 2005 -09 -22 Noriko kando 102 groups from 15 countries registered

Number of Participants by Tasks Submitted Results Chinese J E ntcir 5 -clef 2005 -09

Number of Participants by Tasks Submitted Results Chinese J E ntcir 5 -clef 2005 -09 -22 J E, E J、 E C x CJEK Noriko kando 77 Active Participants from 15 Countries

Geographical Distribution of Participants Finland Germany Ireland Netherlands Spain Switzerland Canada USA China PRC

Geographical Distribution of Participants Finland Germany Ireland Netherlands Spain Switzerland Canada USA China PRC Hong Kong Japan Korea Singapore Taiwan ROC Australia ntcir 5 -clef 2005 -09 -22 Noriko kando 11

NTCIR Workshop 5 (2004 -2005) Organizers +CLIR Hsin-Hsi Chen, NTU Kuang-hua Chen, NTU Kazuaki

NTCIR Workshop 5 (2004 -2005) Organizers +CLIR Hsin-Hsi Chen, NTU Kuang-hua Chen, NTU Kazuaki Kishida, Surugadai U Kazuko Kuriyama, Shirayuri U Sukhoon Lee, NCU Sung Hyon Myaeng, IIU Noriko Kando, NII +CLQA Kuang-hua Chen, NTU Chuan-Jie Lin , Nat Taiwan Ocean U Yutaka Sakaki, ATR +PATENT Atsushi Fujii, Tsukuba U Makoto Iwayama, Hitachi/TITEC Noriko Kando, NII +QA Junichi Fukumoto, Ritsumeikan U Tsuneaki Kato, U Tokyo Fumito Masui, Mie U Program chair: Noriko Kando, NII ntcir 5 -clef 2005 -09 -22 +WEB Keizo Oyama, NII Masao Takaku, NII Noriko kando +Trend Info [Pilot] 12

NTCIR Workshop 5 (2004 -2005) Organizers +CLIR Hsin-Hsi Chen, NTU Kuang-hua Chen, NTU Kazuaki

NTCIR Workshop 5 (2004 -2005) Organizers +CLIR Hsin-Hsi Chen, NTU Kuang-hua Chen, NTU Kazuaki Kishida, Surugadai U Kazuko Kuriyama, Shirayuri U Sukhoon Lee, NCU Sung Hyon Myaeng, IIU Noriko Kando, NII +CLQA Kuang-hua Chen, NTU Chuan-Jie Lin , Nat Taiwan Ocean U Yutaka Sakaki, ATR +PATENT Atsushi Fujii, Tsukuba U Makoto Iwayama, Hitachi/TITEC Noriko Kando, NII +QA Junichi Fukumoto, Ritsumeikan U Tsuneaki Kato, U Tokyo Fumito Masui, Mie U Program chair: Noriko Kando, NII ntcir 5 -clef 2005 -09 -22 +WEB Keizo Oyama, NII Masao Takaku, NII Noriko kando +Trend Info [Pilot] 13

NTCIR test collections Collection NTCIR-1 CIRB 010 NTCIR-2 Summ NTCIR-3 CLIR task IR IR

NTCIR test collections Collection NTCIR-1 CIRB 010 NTCIR-2 Summ NTCIR-3 CLIR task IR IR IR Documents Genre Size Academic 577 MB News 132 MB topic. /Q Relevance/ Answer Language JE J 3 Ct Ct. E 4 Academic 800 MB JE JE J J Summ News IR News 180 docs 884 MB Ct. KJE 4 NTCIR-3 IR Patent 18 GB(+5 GB) J(JE) Cs. Ct. KJE 3 PATENT NTCIR-3 QA QA News 282 MB J J(E) exact NTCIR-3 Summ News 60 docs+50 sets J -   NTCIR-3 WEB IR WEB 100 GB Multiple J(E) 4+relative NTCIR-4 CLIR IR News ca 3 GB Ct. KJE 4 NTCIR-4 Cs. Ct. KJE IR Patent 45 GB J(JE) 3 PATENT NTCIR-4 QA QA News 776 MB J J(E) 4 NTCIR-4 Summ News 30 sets J -   NTCIR-4 WEB IR WEB 100 GB Multiple J(E)   Ct:Traditional Chinese、Cs:Simplified Chinese、K:Korean、J:Japanese、E:English ntcir 5 -clef 2005 -09 -22 Noriko kando

Situation on the Data Distribution of Research Purpose Use of NTCIR Test Collections ntcir

Situation on the Data Distribution of Research Purpose Use of NTCIR Test Collections ntcir 5 -clef 2005 -09 -22 Noriko kando 15

What’s New to NTCIR-4 - Open Submission Session - ACM-TALIP Special Issue Recommendation -

What’s New to NTCIR-4 - Open Submission Session - ACM-TALIP Special Issue Recommendation - Open Attendance - Research Purpose Use of the Submission Raw Data Started with NTCIR-3 CLIR, and then will enlarge - Online Working Notes and Slides ntcir 5 -clef 2005 -09 -22 Noriko kando

What’s New to NTCIR-5 - Open Submission >>>> - ACM-TALIP Special Issue Recommendation (need

What’s New to NTCIR-5 - Open Submission >>>> - ACM-TALIP Special Issue Recommendation (need changing the strategy), but Special Issue on Patent at IP&M - Open Attendance >>>> - Research Purpose Use of the Submission Raw Data >>>> - Online Working Notes and Slides >>>> Proceedings at Conference Only (No working notes) - Pilot tasks and feasibility studies using different funding scheme, ex. Multi modal trend information [co -funding NTT, Tokyo U], “why” question w/automatic “pyramid” evaluation [w: ISI/UCS] ntcir 5 -clef 2005 -09 -22 Noriko kando

Acknowledgment • • • Central Daily News China Times Inc. Chosunilbo Hankooki. com Industrial

Acknowledgment • • • Central Daily News China Times Inc. Chosunilbo Hankooki. com Industrial Property Cooperation Center • Japan Parent Office • Japan Patent Information Organization ntcir 5 -clef 2005 -09 -22 Korea Economic Daily Linguistic Data Consortium Mainichi Newspaper Nippon Database Kaihatsu, Co. Ltd. NTT NRI Cyber Patent PATOLIS the Sing Tao Group Taiwan News Tokyo Univ UDN. COM Wisers Information Ltd. Yomiuri Shinbun Noriko kando

Cross-Language Information Retrieval (CLIR) Task Organizers Kazuaki Kishida*, Kuang-hua Chen, Sukhoon Lee, Hsin-Hsi Chen,

Cross-Language Information Retrieval (CLIR) Task Organizers Kazuaki Kishida*, Kuang-hua Chen, Sukhoon Lee, Hsin-Hsi Chen, Koji Eguchi, Noriko Kando Kazuko Kuriyama, Sung Hyon Myaeng ntcir 5 -clef 2005 -09 -22 Noriko kando 19

NTCIR-5 CLIR 50 topics Documents Chinesetrad J E C K J J C J

NTCIR-5 CLIR 50 topics Documents Chinesetrad J E C K J J C J K K E E C C E K C J C E Korean English Japanese 1. 6 M Docs 3. 3 GB Published in 1998 -1999 K • Short Q: D-only and T-only are mandatory • Background info of search requests • Balance btw topic-types: - specific (ex. Particular event) vs generic - proper nouns vs without PN - domestic/regional/international ntcir 5 -clef 2005 -09 -22 Noriko kando 20

Design of CLIR Task • Subtasks – – Multilingual CLIR (MLIR) : e. g.

Design of CLIR Task • Subtasks – – Multilingual CLIR (MLIR) : e. g. , C - CJKE Bilingual CLIR (BLIR): e. g. , C - J Single Language IR (SLIR): e. g. , C - C Pivot Bilingual CLIR (PLIR): e. g. , C - E - J • Languages – Chinese (C), Japanese (J), Korean (K), English (E) ntcir 5 -clef 2005 -09 -22 Noriko kando

Documents for CLIR at NTCIR-3 Publiched in 1998 -1999 Published in 1994 Japanese English

Documents for CLIR at NTCIR-3 Publiched in 1998 -1999 Published in 1994 Japanese English Korean Chinesetrad 66 K doc 250 K 220 K 870 MB 23 K doc doc 3. 3 GB Published in 1998 -1999 NTCIR-4 Chinesetrad 380 K doc NTCIR-5 Korean 250 K doc ntcir 5 -clef 2005 -09 -22 590 K doc Published in 2000 -2001 Chinesetrad Korean 901 K doc Japanese 220 K doc Japanese 858 K doc English 350 K doc English 259 K doc Noriko language kando Every is multi-sources.

Test Collection • Queries – 50 topics • Relevance Judgments – 4 grades –

Test Collection • Queries – 50 topics • Relevance Judgments – 4 grades – Highly Relevant (S), Relevant (A), Partial Relevant (B), Non-Relevant (C) • Mandatory Runs – TITLE-only run, DESC-only run ntcir 5 -clef 2005 -09 -22 Noriko kando

Result Submission • 24 groups submitted results – From Australia, Canada, China PRC, Finland,

Result Submission • 24 groups submitted results – From Australia, Canada, China PRC, Finland, Germany, Hong Kong, Japan, Korea, Netherlands, Singapore, Spain, Switzerland, Taiwan, USA (14 countries and areas) ntcir 5 -clef 2005 -09 -22 Noriko kando

Techniques Used (NTCIR-4) • Indexing, Stop Words, Decompounding • Mostly “Query Trans”, but one

Techniques Used (NTCIR-4) • Indexing, Stop Words, Decompounding • Mostly “Query Trans”, but one “Bi-Directoral” • Query and Document translation – MT, MRD, Parallel corpora • Translation disambiguation • Out-of-vocabulary (OOV) problem – Use of Web resources – Transliteration - Cognate • Query expansion techniques – Pseudo-relevance feedback, FPRF – Use of Knowledge ontology • Merging strategies ntcir 5 -clef 2005 -09 -22 Noriko kando

Homework from NTCIR-4: Best SLIR and BLIR runs (D-run, Rigid) MAP and % to

Homework from NTCIR-4: Best SLIR and BLIR runs (D-run, Rigid) MAP and % to Monolingual J-C K-C E-C C-K J-K E-K C-C. 0548. 1447. 0663 K-K. 3973. 3984. 3249 ntcir 5 -clef 2005 -09 -22 . 3255 16. 8% 44. 5% 20. 4%. 4685 84. 8% 85. 0% 69. 3% C-J K-J E-J C-E J-E K-E Noriko kando Extremely high! J-J. 2309. 2935. 2674 E-E. 2238. 3340. 2250 . 3804 60. 7% 77. 2% 70. 3%. 3469 64. 5% 96. 2% 64. 9% 26

Patent Retrieval Task Organizers Atsushi Fujii (Univ of Tsukuba) Makoto Iwayama (TIT/Hitachi) Noriko Kando

Patent Retrieval Task Organizers Atsushi Fujii (Univ of Tsukuba) Makoto Iwayama (TIT/Hitachi) Noriko Kando (NII) ntcir 5 -clef 2005 -09 -22 Noriko kando 27

Patent Retrieval Tasks situation & users’ information seeking task Newspaper Patent Applications Patent Claims

Patent Retrieval Tasks situation & users’ information seeking task Newspaper Patent Applications Patent Claims 5 yrs, 45 GB: NTCIR-3 PATENT 10 yrs 90 GB NTCIR-4, -5 PATENT (2001 -2002) (2003 -2004)(2004 -2005) From a claim of a new Technological Survey: Search patents by newspaper patent application, search patents that can End user: non-experts (ex. invalidate the new patent Business manager) application. User: patent experts ntcir 5 -clef 2005 -09 -22 Noriko kando

NTCIR-4 Patent (2003 -2004) TOPICS DOCUMENTS Main: Search patents by Ca. 3. 5 M

NTCIR-4 Patent (2003 -2004) TOPICS DOCUMENTS Main: Search patents by Ca. 3. 5 M docs patent Ca. 45 GB - text retrieval + relevant (34 manual + 69 automatic) Japanese (1993 -1997) Full text with author’s abstract (in Japanese) English Chinesetrad Chinesesymp passage pinpointing Feasibility: patent map automatic creation - make a table from a set of relevant patents on a topic (more By professional than 100 patents), to see the tech trends. text mining), 3 year abstractors task Korean Patents (claims) Translation ntcir 5 -clef 2005 -09 -22 (1993 -1997) Abstract (in English) 3. 5 million docs. Noriko for kandoevaluation 1993 -97 are used

NTCIR-5 Patent (2004 -2005) TOPICS DOCUMENTS Search patents by patent 1200 -11 automatic) Ca.

NTCIR-5 Patent (2004 -2005) TOPICS DOCUMENTS Search patents by patent 1200 -11 automatic) Ca. 7 M docs - text retrieval + relevant (34 manual + Ca. 90 GB (1993 -2002) Full text with author’s abstract (in Japanese) Japanese English passage pinpointing Passage Retrieval F-term Classification By professional abstractors Patents (claims) Translation ntcir 5 -clef 2005 -09 -22 (1993 -2002) Abstract (in English) 7 million docs. 5 kando GB Noriko

Search topics • Japanese patent application rejected by Japanese Patent Office (JPO) • 34

Search topics • Japanese patent application rejected by Japanese Patent Office (JPO) • 34 main topics: selected and judged by human patent experts of “Japan Intellectual Property Association” (JIPA) (created at NTCIR-4) • 1189 additional topics: applications rejected by JPO/ evaluate by using the citations only • Quite few relevant documents ntcir 5 -clef 2005 -09 -22 Noriko kando

Example search topic Date of filing Relevant documents must be prior art, which had

Example search topic Date of filing Relevant documents must be prior art, which had been open to the public before the topic patent was filed <TOPIC> <NUM>008</NUM> <LANG>EN</LANG> <FDATE>19960527</FDATE> <CLAIM>(Claim 1) A sensor device, characterized in that an open recessed part is formed on a box-shaped forming base, a conductive film of a designated pattern is formed on the surface of the forming base including the inner surface of the recessed part, an element for a sensor is bonded to the recessed part, and the forming base is closed with a cover. </CLAIM>. . . </TOPIC> Target for invalidation ntcir 5 -clef 2005 -09 -22 Noriko kando 33

Relevance judgment • Document-based relevant judgment – A: patent that can invalidate the topic

Relevance judgment • Document-based relevant judgment – A: patent that can invalidate the topic claim – B: patent that can invalidate the topic claim, when used with other patents • passage-based relevant judgment: – combinational relevance • Submitted runs were evaluated by mean average precision (MAP) ntcir 5 -clef 2005 -09 -22 Noriko kando

NTCIR-4 Two stage refinement ntcir 5 -clef 2005 -09 -22 Noriko kando 35

NTCIR-4 Two stage refinement ntcir 5 -clef 2005 -09 -22 Noriko kando 35

ntcir 5 -clef 2005 -09 -22 Noriko kando 36

ntcir 5 -clef 2005 -09 -22 Noriko kando 36

NTCIR-5 Doc. IR AB, Manual Queries ntcir 5 -clef 2005 -09 -22 Noriko kando 37

NTCIR-5 Doc. IR AB, Manual Queries ntcir 5 -clef 2005 -09 -22 Noriko kando 37

Passage Retrieval • Provide Topics and Relevant documents – NTCIR-4 Topics 41 • Dry

Passage Retrieval • Provide Topics and Relevant documents – NTCIR-4 Topics 41 • Dry runs 7,Formal runs 34 – Relevant Docs 378 – Sort the passages in the relevant docs ntcir 5 -clef 2005 -09 -22 Noriko kando

Ex. Results file passage retrieval Always 0 Topic ID 0001 0002. . . rank

Ex. Results file passage retrieval Always 0 Topic ID 0001 0002. . . rank score Passage ID 0 0 0 ntcir 5 -clef 2005 -09 -22 Run ID 1993 -123456 -5 1993 -123456 -3 1993 -123456 -0 1994 -000002 -3 1994 -000002 -1 1 2 3 4 5 Noriko kando 9999 9999 ntc 1 ntc 1 39

Evaluation of passage Retrieval • MAP – See both Recall and Precision • Expected

Evaluation of passage Retrieval • MAP – See both Recall and Precision • Expected Search Length (ESL) – See Precision ntcir 5 -clef 2005 -09 -22 Noriko kando

Ex. evaluation by ESL Search length = 5 Relevant passage ……… Relevant docs ntcir

Ex. evaluation by ESL Search length = 5 Relevant passage ……… Relevant docs ntcir 5 -clef 2005 -09 -22 • evaluate by the number of passages (search length) that the user read by he/she obtains sufficient evidences • average the search length by each rel doc Noriko kando 41

PR-curve by Macro average Baseline: in order of the passage ID ntcir 5 -clef 2005

PR-curve by Macro average Baseline: in order of the passage ID ntcir 5 -clef 2005 -09 -22 Noriko kando 42

ntcir 5 -clef 2005 -09 -22 Noriko kando 43

ntcir 5 -clef 2005 -09 -22 Noriko kando 43

ntcir 5 -clef 2005 -09 -22 Noriko kando 44

ntcir 5 -clef 2005 -09 -22 Noriko kando 44

ntcir 5 -clef 2005 -09 -22 Noriko kando 45

ntcir 5 -clef 2005 -09 -22 Noriko kando 45

NTCIR-4 Feasibility Study: automatic patent map generation search topic application retrieval documents JAPIO abst

NTCIR-4 Feasibility Study: automatic patent map generation search topic application retrieval documents JAPIO abst PAJ topics and documents in NTCIR-3 collection classification ntcir 5 -clef 2005 -09 -22 Noriko kando visualization multi-dimensional matrix 46

Example (blue light-emitting diode) given problems to be solved crystalline reliability solutions   long

Example (blue light-emitting diode) given problems to be solved crystalline reliability solutions   long operating life emission stability emission intensity structure of active layer     1998 -145000 1998 -233554     electrode composition   1998 -107318   1998 -190063 1998 -209498 1998 -209495   1998 -215034 1998 -223930 1998 -242518 1998 -173230 1998 -209499 1998 -256602 1998 -242515 1998 -270757 1998 -135516 1998 -242586 1998 -247761   1998 -135514 1998 -256668   1998 -012923 1998 -247745 1998 -256597 electrode arrangement structure of light emitting element ntcir 5 -clef 2005 -09 -22 Noriko kando 47 participants identify lines and columns

NTCIR 4 FS(patent map) Lesson learned • • • Classification(Clustering) : very good Labeling

NTCIR 4 FS(patent map) Lesson learned • • • Classification(Clustering) : very good Labeling the clusters: future work “Solution” only Too small # of topics Evaluation: insufficient – Can not cross system evaluation ntcir 5 -clef 2005 -09 -22 Noriko kando

NTCIR 5 F term classification • IUse existing Classification (F terms) – Many topics

NTCIR 5 F term classification • IUse existing Classification (F terms) – Many topics – Cross-system Evaluation • F term: multi-perspective classification – Can be used for Patent Map Automatic Creation ntcir 5 -clef 2005 -09 -22 Noriko kando

tasks • Topic classification – Provide Topic to each patent or Abstract • F

tasks • Topic classification – Provide Topic to each patent or Abstract • F term classification – Provide F terms to patents (or abstracts) in a specific topic) ntcir 5 -clef 2005 -09 -22 Noriko kando

Purpose of the task • Topic classification:  – Classification of the structured documents •

Purpose of the task • Topic classification:  – Classification of the structured documents • F term classification:  – Multi-perspective classification ntcir 5 -clef 2005 -09 -22 Noriko kando

Question Answering Challenge Task Organizers Jun'ichi FUKUMOTO Tsuneaki KATO Fumito MASUI ntcir 5 -clef 2005

Question Answering Challenge Task Organizers Jun'ichi FUKUMOTO Tsuneaki KATO Fumito MASUI ntcir 5 -clef 2005 -09 -22 Noriko kando 52

Question Answering Challenge at NTCIR Subtask 3: A series of questions. Report writing task:

Question Answering Challenge at NTCIR Subtask 3: A series of questions. Report writing task: topic centered vs browsing, Eval by F-measure -Exact Answers - Return in 48 hours -Doc IDs are required as support information Same as the subtask-3 at NTCIR-4 ntcir 5 -clef 2005 -09 -22 Noriko kando

Series of Question Situation Settings (User’s Task) 1. Collecting information about a particular topic

Series of Question Situation Settings (User’s Task) 1. Collecting information about a particular topic – One (hidden) global topic and series of Qs on subtopics of the global topic 2. Browsing along transitive interests – Topic or focus of the Qs are shifting through the interaction of the user and system. – Local coherence with the previous Q only ntcir 5 -clef 2005 -09 -22 Noriko kando

Relation to Multi-Doc Summarization Answering a series of Qs has a close relation with

Relation to Multi-Doc Summarization Answering a series of Qs has a close relation with Multi-Doc Summarization: – Series of Qs covers subtopics shall be contained in a summary; can be used as “quality questions”, – Summarization as pre-processing of QA? – QA for pre-processing of Abstract-type summary generation? ntcir 5 -clef 2005 -09 -22 Noriko kando

Example of Series of Questions (hidden global Q= Seiji Ozawa) • • When was

Example of Series of Questions (hidden global Q= Seiji Ozawa) • • When was Seiji Ozawa born? Where was he born? Which university did he graduate from? Who did he study under? Who recognized him? Which orchestra was he conducting in 1998? Which orchestra will he begin to conduct in 2002? ntcir 5 -clef 2005 -09 -22 Series 14: Strictly Gathering Type Noriko kando

Example of Series of Questions (Browsing type Q= topics shifting) • Which stadium is

Example of Series of Questions (Browsing type Q= topics shifting) • Which stadium is home to the New York Yankees? • When was it built? • How many persons' monuments have been displayed there? • Whose monument was displayed in 1999? • When did he come to Japan on honeymoon? • Who was the bride at that time? • Who often draws pop art using her as a motif? • What company's can did he often draw also? ntcir 5 -clef 2005 -09 -22 Series 22: Browsing Type Noriko kando

NTCIR-4: Evaluation by MMF ntcir 5 -clef 2005 -09 -22 Noriko kando

NTCIR-4: Evaluation by MMF ntcir 5 -clef 2005 -09 -22 Noriko kando

NTCIR-4: Differences on Series Type ntcir 5 -clef 2005 -09 -22 Noriko kando

NTCIR-4: Differences on Series Type ntcir 5 -clef 2005 -09 -22 Noriko kando

Homework from NTCIR-4: Problems on Evaluation One set of all the answers == F-measure

Homework from NTCIR-4: Problems on Evaluation One set of all the answers == F-measure • Multiple answers and context Wrong answer Ex. • Q 1 -Countries in East Asia? Ans-PRC, ROC, N Korea, S Korea, UK • Q 2 -Capitals of these countries? Ans- Beijing, Taipei, Pyongyang, Soul, Tokyo • Expression diversity and identification of the same answers Ex. A and B are the same or not? # of total correct answers and recall value depends on such decision Tokyo is not capital of UK. • Major and minor answers Correct answer for Q 2 but ntcir 5 -clef 2005 -09 -22 Noriko kando this system produced wrong answer for Q 1.

Cross-Language Question Answering Task Organizers Kuang-hua Chen, NTU Chuan-Jie Lin , Nat Taiwan Ocean

Cross-Language Question Answering Task Organizers Kuang-hua Chen, NTU Chuan-Jie Lin , Nat Taiwan Ocean U Yutaka Sakaki, ATR ntcir 5 -clef 2005 -09 -22 Noriko kando 61

NTCIR Cross-Lingual Question Answering (CLQA 1) Language pairs: J->E, E->J, C->E, C->C, E->C Target:

NTCIR Cross-Lingual Question Answering (CLQA 1) Language pairs: J->E, E->J, C->E, C->C, E->C Target: Questions about named entities (PERSON, DATE, SPEED …) Newspaper articles Question: “Who is Japanese Prime Minster? ” J “小泉は…” E J (Koizumi …) Translation C E Answer: “小泉” J (Koizumi) E C ntcir 5 -clef 2005 -09 -22 C Organizers:  Japanese: Yutaka Sasaki (ATR)  Chinese: Hsin-Hsi Chen Kuang-hua Chen Noriko kando Lin (NTU) Chuan-Jie 62

Traditional QA 「who is the Priminister of Japan? 」 Q analysis Q type =

Traditional QA 「who is the Priminister of Japan? 」 Q analysis Q type = PERSON Doc Retrieval =Mori, Koizumi, Bush Answer candidate Docs Select Answers 1. Koizumi 2. Mmori Rank the answer candidates according to the relationship in the documents “日本の 小泉首相がブッシュ大統領と …” ntcir 5 -clef 2005 -09 -22 “森前首相 は 訪問先の…”Noriko kando 63

QA by Machine Learning 「who is the Priminister of Japan? 」 Q analysis Q

QA by Machine Learning 「who is the Priminister of Japan? 」 Q analysis Q type Classification by ML Doc Retrieval Answer candidate Docs Select Answers NE Answer classification 1. Koizumi 2. Mmori “日本の 小泉首相がブッシュ大統領と …” ntcir 5 -clef 2005 -09 -22 “森前首相 は 訪問先の…”Noriko kando Q NE Answer 64

Question Biased Term Extraction Question + Term Extraction = QA 「who is the Priminister

Question Biased Term Extraction Question + Term Extraction = QA 「who is the Priminister of Japan? 」 Doc Ret Answer extraction QBTE:  term extraction biased to question Doc 1. 小泉首相 2. 森 “日本の 小泉首相がブッシュ大統領と …” ntcir 5 -clef 2005 -09 -22 Noriko kando “森前首相 は 訪問先の…” 65

NTCIR-5 WEB Navigational Retrieval Task 2 • Goal: “Known Item Search”. – To search

NTCIR-5 WEB Navigational Retrieval Task 2 • Goal: “Known Item Search”. – To search for one or more representative Web pages on a known item. ) • Data set: NW 1000 G-04 (1. 36 TB or 1. 5× 1012 byte Web page. crawled in 2004) • Topics: 400+800 – TITLE part (1 -3 search terms) only is mandatory. – analyzing relationship among search techniques, topic types, search item categories and relevant page styles. • Submitted runs: 35 (+28 by organizers) (+3 with trouble) • Relevance judgment: relevant, partially relevant, nonrelevant. “Representativeness” was judged based on every available information, e. g. , provider of the page, content (text, images, etc. ), URL, in/out-linked pages. ntcir 5 -clef 2005 -09 -22 Noriko kando

NTCIR-5 WEB Navigational Retrieval Task 2 • • Evaluation measures: DCG and MRR at

NTCIR-5 WEB Navigational Retrieval Task 2 • • Evaluation measures: DCG and MRR at top 10 doc. level Evaluation result: Tendency on MRR & DCG – Several anchor-base systems performed best. – Link-base method or URL-base method made no contribution to anchor-base systems. – Several link-base systems performed fairly. – Content-base systems performed poorly. • Future work: – Evaluate systems considering duplication of relevant/partially-relevant documents – Verify stability of evaluation measures – Check comprehensiveness of assessment results – Study on evaluation measures reflecting users’ overall cost – Analyze topic-by-topic behavior of each system ntcir 5 -clef 2005 -09 -22 Noriko kando 67

Homeworks Funding Combine multiple fundings Pilot tasks by NII’s Open call collaborative research grant

Homeworks Funding Combine multiple fundings Pilot tasks by NII’s Open call collaborative research grant Organization How to Select New tasks. How to Terminate Old Tasks. Results Analysis Submission raw data. Let’s work together! ntcir 5 -clef 2005 -09 -22 Community Crisis!!! Divided to “Providers” and “users” Advertisements How to appeal the importance to the evaluation. - Main achievement? - publication? - Effect and visibility? Noriko kando

Contact Info & Online Proceedings Documents used are Asian Languages but participation from all

Contact Info & Online Proceedings Documents used are Asian Languages but participation from all over the world is more than welcome!! Open Submission Session for NTCIR-5 Inquiries: Noriko Kando at kando (at) nii. ac. jp Online proceedings, application & other info: http: //research. nii. ac. jp. /ntcir/ ntcir 5 -clef 2005 -09 -22 Noriko kando

Thanks Merci Danke schön Gracie Gracias Ta! Tack Köszönöm Kiitos Terima Kasih Khap Khun

Thanks Merci Danke schön Gracie Gracias Ta! Tack Köszönöm Kiitos Terima Kasih Khap Khun Ahsante Tak 謝謝 ありがとう http: //research. nii. ac. jp/ntcir/ ntcir 5 -clef 2005 -09 -22 Noriko kando

Details of relevant documents (A: rigid relevant) citation 19 0 *ISJ=Interactive Search and Judgment

Details of relevant documents (A: rigid relevant) citation 19 0 *ISJ=Interactive Search and Judgment JIPA =ISJ* 17 40 0 System=Pooling 25 58 total number of A-rel documents is 159 ntcir 5 -clef 2005 -09 -22 Noriko kando 71

Results: Subtask 1 MRR of correct ratio of 1 st ranked answer and among

Results: Subtask 1 MRR of correct ratio of 1 st ranked answer and among 5 th ranked ones ntcir 5 -clef 2005 -09 -22 Noriko kando

Results: Subtask 2 Average F-measure, Precision, and Recall over all Qs ntcir 5 -clef 2005

Results: Subtask 2 Average F-measure, Precision, and Recall over all Qs ntcir 5 -clef 2005 -09 -22 Noriko kando

NTCIR-4 WEB (A)Informational Retrieval Task (B) Navigational Retrieval Task [Pilot](C) Geographical Task [Pilot](D) Topical

NTCIR-4 WEB (A)Informational Retrieval Task (B) Navigational Retrieval Task [Pilot](C) Geographical Task [Pilot](D) Topical Classification Task retrieval result classification, eg. using clustering Documents: – ‘NW 100 G-01’ (100 GB Web pages crawled in 2001 from “*. jp”) for Subtasks A and B – ‘Target data’ (subset of the NW 100 G-01) for Subtasks C and D. ntcir 5 -clef 2005 -09 -22 Noriko kando