INTRODUCTION TO INFORMATION RETRIEVAL CS 4323 0910 1

  • Slides: 35
Download presentation
INTRODUCTION TO INFORMATION RETRIEVAL CS 4323 / 0910 -1 YFA Tersedia online di http:

INTRODUCTION TO INFORMATION RETRIEVAL CS 4323 / 0910 -1 YFA Tersedia online di http: //www. ittelkom. ac. id/staf/yanuar 01 YFA CS 4323 S 1/IT/IR/E 6/0910 Institut Teknologi Telkom http: //www. ittelkom. ac. id/staf/yanuar

References • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval,

References • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. • Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval. Addison Wesley, 1999. • William B. Frakes and Ricardo Baeza-Yates, Information Retrieval Data Structures and Algorithms. Prentice Hall, 1992. • Amy Langville and Carl Meyer, Google's Page. Rank and Beyond: the Science of Search Engine Rankings. Princeton University Press, 2006. • G. Salton and M. J. Mc. Gill, Introduction to Modern Information Retrieval. Mc. Graw-Hill, 1983. http: //www. ittelkom. ac. id/staf/yanuar

References • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval,

References • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. http: //www. ittelkom. ac. id/staf/yanuar

References • Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval. Addison Wesley, 1999. http:

References • Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval. Addison Wesley, 1999. http: //www. ittelkom. ac. id/staf/yanuar

References • William B. Frakes and Ricardo Baeza-Yates, Information Retrieval Data Structures and Algorithms.

References • William B. Frakes and Ricardo Baeza-Yates, Information Retrieval Data Structures and Algorithms. Prentice Hall, 1992. http: //www. ittelkom. ac. id/staf/yanuar

References • Amy Langville and Carl Meyer, Google's Page. Rank and Beyond: the Science

References • Amy Langville and Carl Meyer, Google's Page. Rank and Beyond: the Science of Search Engine Rankings. Princeton University Press, 2006. http: //www. ittelkom. ac. id/staf/yanuar

References • G. Salton and M. J. Mc. Gill, Introduction to Modern Information Retrieval.

References • G. Salton and M. J. Mc. Gill, Introduction to Modern Information Retrieval. Mc. Graw-Hill, 1983. http: //www. ittelkom. ac. id/staf/yanuar

References • Sample of Examination http: //www. infosci. cornell. edu/Courses/info 4300/200 9 fa/sample-exam. html

References • Sample of Examination http: //www. infosci. cornell. edu/Courses/info 4300/200 9 fa/sample-exam. html • Sample of Test Data http: //www. infosci. cornell. edu/Courses/info 4300/200 9 fa/test. Data. html http: //www. ittelkom. ac. id/staf/yanuar

Information Science brings together faculty, students and researchers who share an interest in combining

Information Science brings together faculty, students and researchers who share an interest in combining computer science with the social sciences of how people and society interact with information. This course is intended for both Computer Science and Information Science students. http: //www. ittelkom. ac. id/staf/yanuar

Discussion Class What is Information Retrieval? http: //www. ittelkom. ac. id/staf/yanuar

Discussion Class What is Information Retrieval? http: //www. ittelkom. ac. id/staf/yanuar

Course Description This course studies techniques and human factors in discovering information in online

Course Description This course studies techniques and human factors in discovering information in online information systems. Methods that are covered include techniques for indexing, searching, browsing and filtering information, descriptive metadata, the use of classification systems and thesauruses, with examples from Web search systems http: //www. ittelkom. ac. id/staf/yanuar

Definition Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually

Definition Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). Information retrieval can also cover other kinds of data and information problems beyond that specified in the core definition above. http: //www. ittelkom. ac. id/staf/yanuar

The Field of IR • Now the world has changed, and hundreds of millions

The Field of IR • Now the world has changed, and hundreds of millions of people engage in information retrieval every day when they use a web search engine or search their email. • Information retrieval is fast becoming the dominant form of information access, overtaking traditional database-style searching (the sort that is going on when a clerk says to you: “I’m sorry, I can only look up your order if you can give me your order ID”). • Information retrieval can also cover other kinds of data and information problems beyond that specified in the core definition above. http: //www. ittelkom. ac. id/staf/yanuar

The Field of IR (cont’d) • The field of IR also covers supporting users

The Field of IR (cont’d) • The field of IR also covers supporting users in browsing or filtering document collections or further processing a set of retrieved documents. – Given a set of documents, clustering is the task of coming up with a good grouping of the documents based on their contents. http: //www. ittelkom. ac. id/staf/yanuar

An Example IR Problem • A fat book that many people own is Shakespeare’s

An Example IR Problem • A fat book that many people own is Shakespeare’s Collected Works. Suppose you wanted to determine which plays of Shakespeare contain the words Brutus and Caesar and not Calpurnia. • The simplest form of document retrieval is for a computer to do this sort of linear scan through documents. – The way to avoid linearly scanning the texts for each query is to index the documents in advance. • The problems: – To process large document collections quickly – To allow more flexible matching operations – To allow ranked retrieval. http: //www. ittelkom. ac. id/staf/yanuar

Discussion Class Describe this Picture: http: //www. ittelkom. ac. id/staf/yanuar

Discussion Class Describe this Picture: http: //www. ittelkom. ac. id/staf/yanuar

Searching and Browsing: The Human in the Loop Return objects Return hits Browse repository

Searching and Browsing: The Human in the Loop Return objects Return hits Browse repository Search index http: //www. ittelkom. ac. id/staf/yanuar

Definitions Information retrieval: Subfield of computer science that deals with automated retrieval of documents

Definitions Information retrieval: Subfield of computer science that deals with automated retrieval of documents (especially text) based on their content and context. Searching: Seeking for specific information within a body of information. The result of a search is a set of hits. Browsing: Unstructured exploration of a body of information. Linking: Moving from one item to another following links, such as citations, references, etc. http: //www. ittelkom. ac. id/staf/yanuar

The Basics of Information Retrieval Query: A string of text, describing the information that

The Basics of Information Retrieval Query: A string of text, describing the information that the user is seeking. Each word of the query is called a search term. A query can be a single search term, a string of terms, a phrase in natural language, or a stylized expression using special symbols. Full text searching: Methods that compare the query with every word in the text, without distinguishing the function of the various words. Fielded searching: Methods that search on specific bibliographic or structural fields, such as author or heading. http: //www. ittelkom. ac. id/staf/yanuar

Sorting and Ranking Hits When a user submits a query to a search system,

Sorting and Ranking Hits When a user submits a query to a search system, the system returns a set of hits. With a large collection of documents, the set of hits maybe very large. The value to the use depends on the order in which the hits are presented. Three main methods: • Sorting the hits, e. g. , by date • Ranking the hits by similarity between query and document • Ranking the hits by the importance of the documents http: //www. ittelkom. ac. id/staf/yanuar

Examples of Search Systems Find file on a computer system (Spotlight for Macintosh). Library

Examples of Search Systems Find file on a computer system (Spotlight for Macintosh). Library catalog for searching bibliographic records about books and other objects (Library of Congress catalog). Abstracting and indexing system for finding research information about specific topics (Medline for medical information). Web search service for finding web pages (Google). http: //www. ittelkom. ac. id/staf/yanuar

General Applications of IR http: //www. ittelkom. ac. id/staf/yanuar

General Applications of IR http: //www. ittelkom. ac. id/staf/yanuar

Domain Specific Applications of IR http: //www. ittelkom. ac. id/staf/yanuar

Domain Specific Applications of IR http: //www. ittelkom. ac. id/staf/yanuar

Quizand. Game

Quizand. Game

Puzzle

Puzzle

Puzzle D

Puzzle D

Puzzle

Puzzle

Puzzle A

Puzzle A

E mp o w eri n g An al ysi s

E mp o w eri n g An al ysi s

Empowering Analysis

Empowering Analysis

Puzzle

Puzzle

Jawaban: Puzzle 6

Jawaban: Puzzle 6

Puzzle

Puzzle

Jawaban: Puzzle 9

Jawaban: Puzzle 9

YFA August 2008 Edition), February 2008 http: //www. ittelkom. ac. id/staf/yanuar Adapted from cs.

YFA August 2008 Edition), February 2008 http: //www. ittelkom. ac. id/staf/yanuar Adapted from cs. cornell. edu and cambridge. edu (2 nd http: //www. ittelkom. ac. id/staf/yanuar