Inverted Index Compression and Query Processing with Optimized

  • Slides: 62
Download presentation
Inverted Index Compression and Query Processing with Optimized Document Ordering 2009 -07 -28 SUHARA

Inverted Index Compression and Query Processing with Optimized Document Ordering 2009 -07 -28 SUHARA YOSHIHIKO

Today’s Paper • Hao Yan, Shuai Ding and Torsten Suel • Inverted Index Compression

Today’s Paper • Hao Yan, Shuai Ding and Torsten Suel • Inverted Index Compression and Query Processing with Optimized Document Ordering • WWW 2009 1

Doc. IDの圧縮 11

Doc. IDの圧縮 11

インデクス圧縮手法 • • • Var-Byte coding (var-byte) Rice coding S 9 S 16 PFor.

インデクス圧縮手法 • • • Var-Byte coding (var-byte) Rice coding S 9 S 16 PFor. Delta (PFD) Inetrpolative coding (IPC) 12

Variable-byte coding (var-byte) • byte-wiseの圧縮アルゴリズム • 8 bitsを以下のように解釈 – 1 bit: continuation bit •

Variable-byte coding (var-byte) • byte-wiseの圧縮アルゴリズム • 8 bitsを以下のように解釈 – 1 bit: continuation bit • code ends (1) or not (0) – 7 bit: payload 0000111000 = 824 13

Interpolative coding (IPC) binary_code (target, lo, hi) を呼び出す順番 23

Interpolative coding (IPC) binary_code (target, lo, hi) を呼び出す順番 23

Interpolative codingアルゴリズム (Managing Gigabytesから抜粋) 24

Interpolative codingアルゴリズム (Managing Gigabytesから抜粋) 24

Doc. ID reorderingの効果 • Doc. ID reorderingによるIPC手法の比較 36

Doc. ID reorderingの効果 • Doc. ID reorderingによるIPC手法の比較 36

直感的な解釈 (2/2) sorted decodeの 必要なし 春樹→ doc. ID block 村上→ doc. ID block unsorted

直感的な解釈 (2/2) sorted decodeの 必要なし 春樹→ doc. ID block 村上→ doc. ID block unsorted 春樹→ doc. ID block 村上→ doc. ID block 52

参考文献 • [Zhang 08] J. Zhang, X. Long and T. Suel. Performance of Compressed

参考文献 • [Zhang 08] J. Zhang, X. Long and T. Suel. Performance of Compressed Inverted List Caching in Search Engines. WWW 2008. – 昨年のDBIR輪講で植松さんが紹介 • [Silvestri 07] F. Silvestri. Sorting out the document identifier reassignment. ECIR 2007. • [Anh 05] V. N. Anh and A. Moat. Inverted Index Compression using Word-Aligned Binary Codes. Information Retrieval, 8(1): 151 -166, 2005. 60