INFORMATION RETRIEVAL TECHNIQUES BY DR ADNAN ABID Lecture
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID Lecture # 15 • Merge Sort 1
ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the following sources 1. “Introduction to information retrieval” by Prabhakar Raghavan, Christopher D. Manning, and Hinrich Schütze 2. “Managing gigabytes” by Ian H. Witten, listair Moffat, imothy C. Bell 3. “Modern information retrieval” by Baeza-Yates Ricardo, 4. “Web Information Retrieval” by Stefano Ceri, Alessandro Bozzon, arco Brambilla
Outline • Two-Way Merge Sort • Single-pass in-memory indexing • SPIMI-Invert 3
Two-Way Merge Sort Example Explain the improvement in 2 way merge by incorporating nway merge 4
SPIMI: Single-pass in-memory indexing • Key idea 1: Generate separate dictionaries for each block – no need to maintain term-term. ID mapping across blocks. • Key idea 2: Don’t sort. Accumulate postings in postings lists as they occur. • With these two ideas we can generate a complete inverted index for each block. • These separate indexes can then be merged into one big index. 5
SPIMI-Invert • Merging of blocks is analogous to BSBI.
Summary • So far • • Static Collections (corpus is fixed) Linked List based indexing Array based indexing Data fits into the HD of a single machine • Next • Data does not fit in a single machine • Requires more machines (clusters of machines).
Ch. 4 Resources for today’s lecture • Chapter 4 of IIR • MG Chapter 5 • Original publication on Map. Reduce: Dean and Ghemawat (2004) • Original publication on SPIMI: Heinz and Zobel (2003)
- Slides: 8