Introduction to Information Retrieval Homework 2 VSM and

  • Slides: 13
Download presentation
Introduction to Information Retrieval Homework 2 : VSM and Summary 楊立偉教授 台灣科大資管系 wyang@ntu. edu.

Introduction to Information Retrieval Homework 2 : VSM and Summary 楊立偉教授 台灣科大資管系 wyang@ntu. edu. tw © Copyright 2015 by Willie Yang 1

Introduction to Information Retrieval Requirements • Extract keywords using Homework #1 – n-gram approach

Introduction to Information Retrieval Requirements • Extract keywords using Homework #1 – n-gram approach with tf-idf, and extract at least 200 keywords for each topic. • Use keywords to tag every document • Use VSM to get the cosine similarity • Use keywords to generate the summary – rank the occurrences of keywords for each sentence. 2

Introduction to Information Retrieval Detailed requirements (1) • List the most similar 7 documents

Introduction to Information Retrieval Detailed requirements (1) • List the most similar 7 documents for each query 1427440615049_N 01 逾 21年未調價 台灣水價僅全球1/5 1427958013238_N 01 爽保樂安、樂胃如 恐也用黑心碳酸鎂 1428298064829_N 01 阿帕契案洩軍機 勞乃成坦承「做了不該做的事」 1428007295925_N 01 昆凌懷孕 周杰倫臉書秀老婆凸肚照 1427660012352_N 01 好友郭俊麟奪首勝 周興哲想沾喜氣 1427354105421_N 01 祭李光耀 美派柯林頓代表 1427871470025_N 01 幸福人壽前老董千萬交保 地院烏龍? 1427570908187_N 01 俄、澳將加入 亞投行已 41國申請 1427857112056_N 01 陳國恩: 持續推動減化警察協辦業務 1427235339440_N 01 李光耀辭世 馬赴星私人弔唁 3

Introduction to Information Retrieval Detailed requirements (2) • List the summary (top 3 sentences)

Introduction to Information Retrieval Detailed requirements (2) • List the summary (top 3 sentences) for each query – from single or multiple documents – use punctuations to separate sentences 句末標點符號如,。?! 4

Introduction to Information Retrieval Additional Requirements • Use k-nearest neighbors (k. NN) to tag

Introduction to Information Retrieval Additional Requirements • Use k-nearest neighbors (k. NN) to tag the topic 5

Introduction to Information Retrieval Query examples • Query 1427830978849_N 01 柯文哲表態 國台辦讚賞 1427397522891_N 01

Introduction to Information Retrieval Query examples • Query 1427830978849_N 01 柯文哲表態 國台辦讚賞 1427397522891_N 01 遊客憂水情 石門水庫「看雨」 6

Introduction to Information Retrieval Summarization (1) • Abstraction – use natural language generation technology

Introduction to Information Retrieval Summarization (1) • Abstraction – use natural language generation technology to paraphrasing sections of the source document. • Extraction – copy the important information to the summary (ex. key clauses, sentences or paragraphs) 9

Introduction to Information Retrieval Summarization (2) • Single or Multiple document – generate a

Introduction to Information Retrieval Summarization (2) • Single or Multiple document – generate a summary based on a single document, or multiple documents (for example, a cluster of news stories on the same topic). • The latter is multi-document summarization systems 10

Introduction to Information Retrieval Summarization (3) • 3 steps for quick implementation – Sentences

Introduction to Information Retrieval Summarization (3) • 3 steps for quick implementation – Sentences separation – Sentences ranking – Sentences selection 11

Introduction to Information Retrieval Deliverables (1) • 程式 : 實機展示 • 結果 : 放入Excel或文字檔

Introduction to Information Retrieval Deliverables (1) • 程式 : 實機展示 • 結果 : 放入Excel或文字檔 12