Information Retrieval Prepared by Cong Chau Supervised by

Information Retrieval Prepared by: Cong Chau Supervised by: Prof. Esma Aimeur Cong Chau Information Retrieval March 2004

Agenda • • • Introduction Present Technology Advanced Technology Tendency Conclusion Cong Chau Information Retrieval March 2004 2

Introduction Present Advance Tendency Conclusion • Definition: – Data, Information, Knowledge. • Information is a power. Cong Chau Information Retrieval March 2004 3

Introduction Present Advance Tendency Conclusion Taxonomy Cong Chau Information Retrieval March 2004 4

Introduction Present Advance Tendency Conclusion Taxonomy USER TASK Browsing Flat Structure Guided Hypertext Set Theoretic Fuzzy Extended Boolean Cong Chau Retrieval Specific purpose Filtering Classic Models Boolean Vector Probabilistic Algebraic Generalized Vector Lat. Semantic Index Neural Network Information Retrieval March 2004 Structured Models Non-overlapping Lists Proximal Nodes Probabilistic Inference Network Belief Network 5

Introduction Present Advance Tendency Conclusion Boolean Model • Theory and Boolean algebra AND, OR, NOT, . . . Cong Chau Information Retrieval March 2004 6

Introduction Present Advance Tendency Conclusion Boolean Model Cong Chau Information Retrieval March 2004 7

Introduction Present Advance Tendency Conclusion Vector Model • weights (based on the frequencies usage) or degree of similarity. Cong Chau Information Retrieval March 2004 8

Introduction Present Advance Tendency Conclusion Probabilistic Model • The words were first assigned by an expert a certain probability P(Ki|R)=0. 3; P(Ki|non. R)=0. 7 Cong Chau Information Retrieval March 2004 9

Introduction Present Tendency Advance Conclusion Taxonomy USER TASK Browsing Flat Structure Guided Hypertext Set Theoretic Fuzzy Extended Boolean Cong Chau Retrieval Specific purpose Filtering Classic Models Boolean Vector Probabilistic Algebraic Generalized Vector Lat. Semantic Index Neural Network Information Retrieval March 2004 Structured Models Non-overlapping Lists Proximal Nodes Probabilistic Inference Network Belief Network 10

Introduction Present Advance Tendency Conclusion Fuzzy • Dealing with the marginal, gradual value rather than the Boolean abrupt value 0 or 1. It computes the relations (algebraic sums and algebraic products) between documents and fuzzy index. Cong Chau Information Retrieval March 2004 11

Introduction Present Advance Tendency Conclusion Extended Boolean Model • It is a boolean with a mixed partial matching and term weighting Cong Chau Information Retrieval March 2004 12

Introduction Present Advance Tendency Conclusion Extended Boolean Model: Cong Chau Information Retrieval March 2004 13

Introduction Present Advance Tendency Conclusion Taxonomy USER TASK Browsing Flat Structure Guided Hypertext Set Theoretic Fuzzy Extended Boolean Cong Chau Retrieval Specific purpose Filtering Classic Models Boolean Vector Probabilistic Algebraic Generalized Vector Lat. Semantic Index Neural Network Information Retrieval March 2004 Structured Models Non-overlapping Lists Proximal Nodes Probabilistic Inference Network Belief Network 14

Introduction Present Advance Tendency Conclusion Generalized Vector Space Model • Gathers a set index terms that have a little similar meaning, then it associates the binary (1 or 0) weights to the document-term pairs. • For example: the words similar ‘data’, ‘information’, ‘knowledge’ are grouped in a set. The documents link with the word ‘data’ such as ‘data structure’ and ‘data base’ will generate the first subset {1, 1, 0, 0}. The word ‘information’ will link with the two documents ‘information retrieval’ and ‘information age ’ to form a second subset {0, 0, 1, 1, 0, 0}. Similarly, the documents ‘knowledge management’ and ‘knowledge base’ will have {0, 0, 1, 1}. Cong Chau Information Retrieval March 2004 15

Introduction Present Advance Tendency Conclusion Latent Semantic Indexing • Matching the documents in a matrix instead of indexing them. • M= CTR Cong Chau Information Retrieval March 2004 16

Introduction Present Advance Tendency Conclusion Neural Network • Emitted the signals that are propagated from one node to another. The indexing is fixed, but the weights are changing with time Cong Chau Information Retrieval March 2004 17

Introduction Present Advance Tendency Conclusion Taxonomy USER TASK Browsing Flat Structure Guided Hypertext Set Theoretic Fuzzy Extended Boolean Cong Chau Retrieval Specific purpose Filtering Classic Models Boolean Vector Probabilistic Algebraic Generalized Vector Lat. Semantic Index Neural Network Information Retrieval March 2004 Structured Models Non-overlapping Lists Proximal Nodes Probabilistic Inference Network Belief Network 18

Introduction Present Advance Tendency Conclusion Inference Model • it associates random variables with the index terms. Then depending on the needs, they will be linked with either Prior Probabilities or Boolean or Ranking Strategies. Cong Chau Information Retrieval March 2004 19

Introduction Present Advance Tendency Conclusion Belief Network Model • Not only use an interpretation of probabilities, but also is using the set-theory and sample space, so it can separate the documents and the queries. Cong Chau Information Retrieval March 2004 20

Introduction Present Tendency Advance Conclusion Taxonomy USER TASK Browsing Flat Structure Guided Hypertext Set Theoretic Fuzzy Extended Boolean Cong Chau Retrieval Specific purpose Filtering Classic Models Boolean Vector Probabilistic Algebraic Generalized Vector Lat. Semantic Index Neural Network Information Retrieval March 2004 Structured Models Non-overlapping Lists Proximal Nodes Probabilistic Inference Network Belief Network 21

Introduction Present Advance Tendency Conclusion Proximal Nodes Model • It divides a book in a tree hierarchical structure, starting from root (book) to branches (chapters), smaller branches (sections) and leaves (subsections). Cong Chau Information Retrieval March 2004 22

Introduction Present Advance Tendency Conclusion Non-overlapping List Model • Flat, the information is not overlapping at all. Cong Chau Information Retrieval March 2004 23

Introduction Present Tendency Advance Conclusion Taxonomy USER TASK Browsing Flat Structure Guided Hypertext Set Theoretic Fuzzy Extended Boolean Cong Chau Retrieval Specific purpose Filtering Classic Models Boolean Vector Probabilistic Algebraic Generalized Vector Lat. Semantic Index Neural Network Information Retrieval March 2004 Structured Models Non-overlapping Lists Proximal Nodes Probabilistic Inference Network Belief Network 24

Introduction Present Advance Tendency Conclusion Browsing Models • Flat model: the document is considered as a point in one dimension linear array or two dimensions plan array • Structure guided model: the information is organized in hierarchy levels. Cong Chau Information Retrieval March 2004 25

Introduction Present Advance Tendency Conclusion Advance Models • Voice Browsing: uses the voice user interface (VUI) instead of graphic user interface (GUI), we interact with the digital voice to get the information. Bell’s voice browsing number 3102355. Cong Chau Information Retrieval March 2004 26

Introduction Present Advance Tendency Conclusion Advance Models (conti) • Image Retrieval Model: divides the image into eight different regions, and then it is using the ratio between Red Green Blue (RGB), the gray scale conversion of that image and the relationship between shape and color for indexing. Cong Chau Information Retrieval March 2004 27

Introduction Present Advance Tendency Conclusion Advance Models (conti) • Graph-based Induction Mode: uses the stepwise pair expansion to extract distinctive patterns of the graph. Cong Chau Information Retrieval March 2004 28

Introduction Present Advance Tendency Conclusion Problems Cong Chau Information Retrieval March 2004 29

Introduction Present Advance Tendency Conclusion Advance Models (conti) • Multimedia Retrieval Model uses the key frame of the video, audio, text and graphic with a complex theory of probability (mean, variance, cluster, etc) to index the media. • Multi-Agent System Web Information Retrieval uses many intelligence agents surfing on the web, each try to accomplish one part of a query, then they will collaborate together to give an answer the requested query. Cong Chau Information Retrieval March 2004 30

Introduction Present Advance Tendency Conclusion Trend • Text(T), voice(V), graphic(G), and video(D). to satisfy that demand. The following function as the suggested: Y = B 0 + B 1*T+ B 2*V + B 3*G + B 4*D Cong Chau Information Retrieval March 2004 31

Introduction Present Advance Tendency Conclusion Multi-Agent Cong Chau Information Retrieval March 2004 32

Introduction Present Future Tendency Conclusion Information Retrieval in future Finger Pointing Videos Speakers Laser Pointing Sound Systems Web Displays Human Cong Chau Information Retrieval March 2004 33

Introduction Present Advance Tendency Conclusion • Taxonomy • Advance • Tendency Cong Chau Information Retrieval March 2004 34

Reference • • • BOOKS: Active Mining by Hiroshi Motoda 2002 Modern Information Retrieval. by Ricardo Baeza-Yates, Berthier Ribiero-Neto, Berthier Ribeiro-Neto 1999. PAPERS: 1) G. Slaton. The SMART Retrieval System 1971. 2) S. E. Robert and K. Sparck Jones. Relevance weighting of search terns, Journal of the American Society for Information Sciences 1976 3) Y. Ogawa, T. Morita and K. Kobayashi. Fuzzy Set and Systems 1991. 4) Gerard Salton, Edward A. Fox and Harry Wu. Extended Boolean information retrieval. Communication ACM November 1983. 5) S. K. M. Wong, W. Ziarko, and P. C. N. Wong Generalize vector space model in information retrieval concepts in vecto 4 spaces Conference on Research and Development in Information Retrieval 1985 6) G. W. Furnas, S. Deerwester, S. T. Dumais. , T. K. Landauer, R. A. Harsh, L. A. Streeter and K. E. Lochbaum Infromation retrieval using a singular value decomposition model of latent semantic structure 11 th Annual International ACM SIIGR Conference on Research and Development in Information Retrieval 1988. Cong Chau Information Retrieval March 2004 35

Reference (conti) • • • 7) R. Wilkson and P. Hingston. Using the cosine measure in a neural network for document retrieval Conference on Research and Development in Information Retrieval USA 1991 8) H. Turtle and W. B. Croft Evaluation of an inference network-based retrieval model ACMTransactions on Information Systems July 1991 9) Berthier A. Riberiro-Neto and Richard Muntz A belief network model for IR Conference on Research and Development in IR 1996 10)F. Burkkowski. An algebra for hierarchically organized text-dominated databases. Information Processing & Management 1992. 11) R. Baeza-Yates and Navarro Integrating contents and structure in text retrieval. 12) Motochimi Inoue, Yasue Mitsukur, Minoru Fukumi, and Norio Akamatsu Neural net base image retrieval by using color and location information, IEEE 2000 13) Takashi Matsuda, Hiroshi Motoda, Tetsuya Yoshida, Takashi Washio Knowledge discovery form structured data by Beam-wise Graph-Base Induction 14) Y. Y. Xu, H. C. Fu, and H. T. Pao A WWW-Based Multimedia Information query and retrieve system IEEE 2000. 15) K. B. Shaban, O. A. Basir, K. Hassanein, M. Karnel Information Fusion in a Cooperative Multi-Agent system for Web Information retrieval Cong Chau Information Retrieval March 2004 36
- Slides: 36