INVERTED FILES CS 4323 0910 1 YFA Tersedia
- Slides: 15
INVERTED FILES CS 4323 / 0910 -1 YFA Tersedia online di http: //www. ittelkom. ac. id/staf/yanuar 10 YFA CS 4323 S 1/IT/IR/E 3/1109 Institut Teknologi Telkom http: //www. ittelkom. ac. id/staf/yanuar
Results of a Search x x x x documents found by search query http: //www. ittelkom. ac. id/staf/yanuar hits from search
Relevance Feedback (Concept) http: //www. ittelkom. ac. id/staf/yanuar
Relevance Feedback (Concept) Generated New Query Expansion http: //www. ittelkom. ac. id/staf/yanuar
Relevance Feedback (Concept) x x o x hits from original search o x documents identified as nonrelevant o documents identified as relevant original query reformulated query http: //www. ittelkom. ac. id/staf/yanuar
Relevance Feedback (Concept) http: //www. ittelkom. ac. id/staf/yanuar
Document Clustering (Concept) x x x x x Document clusters are a form of automatic classification. A document may be in several clusters. http: //www. ittelkom. ac. id/staf/yanuar
Organization of Inverted Files Index file Postings file Term Pointer to postings ant bee cat dog elk fox gnu hog Inverted lists http: //www. ittelkom. ac. id/staf/yanuar Documents file
Decisions in Building an Inverted File: Efficiency and Query Languages Some query options may require huge computation, e. g. , Regular expressions If inverted files are stored in lexicographic order, comp* can be processed efficiently *comp cannot be processed efficiently Boolean terms If A and B are search terms A or B can be processed by comparing two moderate sized lists (not A) or (not B) requires two very large lists http: //www. ittelkom. ac. id/staf/yanuar
Postings File The postings file stores the elements of a sparse matrix, the term assignment matrix. It is stored as a separate inverted list for each column, i. e. , a list corresponding to each term in the index file. Each element in an inverted list is called a posting, i. e. , the occurrence on a term in a document Each list consists of one or many individual postings. http: //www. ittelkom. ac. id/staf/yanuar
Postings File: A Linked List for Each Term 1 abacus 2 actor 3 aspen 4 atoll 3 94 2 5 11 3 19 7 19 213 11 70 19 212 29 34 40 22 56 A linked list for each term is convenient to process sequentially, but slow to update when the lists are long. 66 45 http: //www. ittelkom. ac. id/staf/yanuar 43
Index File Structures: Binary Tree Input: elk, hog, bee, fox, cat, gnu, ant, dogc http: //www. ittelkom. ac. id/staf/yanuar
Index File Structures: Binary Tree Input: elk, hog, bee, fox, cat, gnu, ant, dogc elk bee ant hog cat fox dog http: //www. ittelkom. ac. id/staf/yanuar gnu
Binary Tree Advantages Can be searched quickly Convenient for batch updating Easy to add an extra term Economical use of storage Disadvantages Less good for lexicographic processing, e. g. , comp* Tree tends to become unbalanced If the index is held on disk, important to optimize the number of disk accesses http: //www. ittelkom. ac. id/staf/yanuar
YFA November 2009 (3 rd Edition), February 2008 http: //www. ittelkom. ac. id/staf/yanuar Adapted from cs. cornell. edu http: //www. ittelkom. ac. id/staf/yanuar
- Cjis network training v1 answers
- Ncic hosts restricted files and non-restricted files
- Dot powai files are binary files
- Merubah menggayakan
- Metode matematik prioritas masalah
- Sistem informasi berdasarkan dukungan yang tersedia
- Sistem informasi berdasarkan dukungan yang tersedia
- Air hujan sanggup menjadi air tanah lantaran proses … *
- Menetapkan skala prioritas menurut dana yang tersedia
- Uvm zoo files
- Data files in c
- Pascal files
- Finished files are the result of years
- Collection of interrelated data and programs
- Outline steps to be taken when creating a sequential file
- What allows the replication of only immutable files