l l Succinct Data Structure SDS l SDS
- Slides: 48
発表の流れ l 定義 l Succinct Data Structure (SDS) l ビット列に対するSDS l 木構造に対するSDS l 文字列に対するSDS l SDSを用いた圧縮全文索引 l Suffix ArraysとBurrow Wheelers 変換 l FM-index, Compressed Suffix Arrays l まとめ・今後の目標
発表の流れ l 定義 l Succinct Data Structure (SDS) l ビット列に対するSDS l 木構造に対するSDS l 文字列に対するSDS l SDSを用いた圧縮全文索引 l Suffix ArraysとBurrow Wheelers 変換 l FM-index, Compressed Suffix Arrays l まとめ・今後の目標
例 l T = aacbbcbc l |T| =8 , na=2, nb=3, nc=3 l k=0の場合 l H 0(T)= (2/8)*log(8/2)+ … ≒ 0. 47 l k=2の場合 l Σ 2={ac, cb, bc, c$, $$} l Tac=a Tcb=ab Tbb=c … l H 2(T)= (1/8)*0 + (2/8)*1 + … ≒ 0. 25 l実データのH 5の例 英文: 0. 23 DNA: 0. 24 XML: 0. 10
発表の流れ l 定義 l Succinct Data Structure (SDS) l ビット列に対するSDS l 木構造に対するSDS l 文字列に対するSDS l SDSを用いた圧縮全文索引 l Suffix ArraysとBurrow Wheelers 変換 l FM-index, Compressed Suffix Arrays l まとめ・今後の目標
popcount(x) x中の 1の数を数える unsinged int pop. Count(unsinged int r) { r = ((r & 0 x. AAAA) >> 1) + (r & 0 x 5555); r = ((r & 0 x. CCCC) >> 2) + (r & 0 x 3333); r = ((r >> 4) + r) & 0 x 0 F 0 F; r = (r>>8) + r; return ((r>>16) + r) & 0 x 3 F; } 0 x. AAAA = 101010. . . 102 0 x 5555 = 010101. . . 012 0 x. CCCC = 110011. . . 002 0 x 3333 = 001100. . . 112 0 x 0 F 0 F = 0000111100. . . 112
Waveletの例(1/2) l Σ={a, b, c} a = 02 b = 102 c = 112 l T= abbccbaacbab 各文字の 1 bit目 011111001101 0 0 a 1 0 b 1 c Huffman木 1 bbccbcbb bとcだけ抜き出した文字列 00110100 各文字の 2 bit目 a b 0 1 c
Wavelet Treeの例(2/2) 0 a l Σ={a, b, c} a = 02 b = 102 c = 112 l T= abbccbaacbab 011111001101 0 b 1 c Huffman木 1 rank 1(8)=5 rankb(T, 8)=3 bbccbcbb 00110100 a 1 0 1 rank 0(T, 5)=3 c
発表の流れ l 定義 l Succinct Data Structure (SDS) l ビット列に対するSDS l 木構造に対するSDS l 文字列に対するSDS l SDSを用いた圧縮全文索引 l Suffix ArraysとBurrow Wheelers 変換 l FM-index, Compressed Suffix Arrays l 今後の流れ
Suffix Arrays (SA) [Manber 1989] l 入力: T=t 1 t 2 t 3. . t. N l Tの接尾辞(suffix): Sk= tk tk+1 tk+2. . t. N S 1 S 2 S 3 S 4 S 5 S 6 S 7 abraca$ ca$ a$ $ S 7 S 6 S 1 S 4 S 2 S 5 S 3 (1) Tの全ての接尾辞を列挙 $ a$ abraca$ raca$ 7 6 1 4 2 5 3 (3) 接尾辞の番号を抽出 (2) 接尾辞集合を辞書式順序でソートする
SAを使った検索 入力 T=abracadabra$ パタン P = bra 二分探索を行う 11 $ 10 a$ 7 abra$ 0 abracadabra$ 3 acadabra$ 5 adabra$ 8 bra$ 1 bracadabra$ 4 cadabra$ 6 dabra$ 9 ra$ 2 racadabra$ bra > adabra$ 時間計算量 occ(P): O(m log n) loc(P): O(m log n + occ(P)) 空間計算量 bra = bra$ log n bit (5 n byte) Hgt配列を使うと bra < cadabra$ occ(P)はO(m+log n)
Burrows Wheeler’s Transform [1994] (BWT) l 文字列に対する可逆変換(並び替え) l 定義 BWT[i] : = T[SA[i]-1] l 但しSA[i]=0の時 BWT[i] = T[n] l 例 abracadabra$ ⇒ BWT ard$rcaaaabb l BWT後のテキストは非常に圧縮しやすい l 同じ文脈の直前には同じ文字が現れやすい l c. f. Compression boosting [Ferragina 2005] t hese are possible. . . t hese were not of. . t hese. . .
BWT前 When Farmer Oak smiled, the corners of his mouth spread till they were within an unimportant distance of his ears, his eyes were reduced to chinks, and diver gingwrinkles appeared round them, extending upon his countenance like the rays in a rudimentary sketch of the rising sun. His Christian name was Gabriel, and on working days he was a young man of sound judgment, easy motions, proper dress, and general good character. On Sundays he was a man of misty views, rather given to postponing, and hampered by his best clothes andumbrella : upon the whole, one who felt himself to occupy morally that vast …. . 続く BW変換 BWT後 Ioooooioororooooooooorromrrooomooroooormoororioooro ormmmmmuuiiiii. Iiuuuuuuuiii. Uiiiiiioooooorooooiiiioooioiiiiiioiiiiiieuiiiiiouuuuouu. UUuuuuuuooouuiooriiiriirriiiiiiaiiiiioooooooiiiouioiiiioiiuii uiiiiiiiiiiiioiiiiioiuiiiiiiiiiiiiioiiiiiiuiiiioiiiiiiiiiioiiiiiiioiiiaiiiiiiiiioiiiiiiiiuiiiiiiiiioiiiiiiiiiiiiiiiiiiiiiiuuuiioiiiiiuiiiiiiiiiiiioiiiiuioiuiiii iiioiiiiiiiuiiiiiiiiiiiiiiioaoiiiiioioiiiioooiiiiiooioiiiiiouiiiiiiooiiiiiiiiiiiiiii ioiiiiiiiiiioiooiiiiiioiiiiiuiiiiiiiiiiiiiiiiiioiiiiiiioiiiuiiiiioiiiiiiuoiiioiii iiiiiiiiiiuiiiiuuiiiiiiiiiiiiiiiiiiuiuiiiiiuuiiiiiiiiiiiiiiiiiiiiiiiiiiiioiiiiiiioiiiiioiiiiuiiiioiiiiiiiiiiiiiiiiiiioiioiiiiiiuiiiiiiiiooiiiiiiiiiioooiiiiii iioiiiiouiiiiiiii …. . 続く
i SA SA-1[SA[i]+1] BWT Suffix 0 11 3 3 a $ 1 10 7 0 r a$ 2 7 11 6 d abra$ 3 4 5 0 3 5 4 8 5 7 8 9 $ r c abracadabra$ 6 7 8 9 8 1 4 6 9 2 6 10 10 11 5 2 a a bra$ bracadabra$ 10 11 9 2 1 0 1 4 b b ra$ racadabra$
逆BW変換 (LF-mapping) void rev. BWT(char* bwt, int n){ int count [0 x 100]; memset(count, 0, sizeof(int)*0 x 100); for (i = 0; i < n; i++) count[bwt[i]]++; for (int i = 1; i < 0 x 100; i++) count[i]+=count[i-1]; int* LFmapping = new int[n]; for (int i = n-1; i >= 0; i--){ LFmapping[--count[bwt[i]] = i; } int next = find(BWT, ’$’); //return the position of ‘$’ for (int i = 0; i < n; i++){ next = LFmapping[next]; putchar(bwt[next]); } delete[] LFmapping; }
I SA BWT Head of Suffix 0 11 a 1 $ 1 10 r 1 a 1 2 7 d a 2 3 4 5 0 3 5 $ r 2 c a 3 a 4 a 5 6 7 8 9 8 1 4 6 a 2 a 3 a 4 a 5 b 1 b 2 c d 10 11 9 2 b 1 b 2 r 1 r 2 sp abr br r ep P=“abr” T=“abracadabra$” BWT=“ard$rcaaaabb” sp : = 0 ep : = 11 sp : = 9+0+1 = 10 ep : = 9+2 = 11 sp : = 5+0+1 = 6 ep : = 5+2 = 7 sp : = 1+1+1 = 3 ep : = 1+3 = 4 i=2 c=‘r’ i=1 c=‘b’ i=0 c=‘a’ i : = m-1 sp : = 0; ep : = n-1; while (sp ≦ ep) and (i >= 0) do c : = P[i]; sp : = C[c]+rank(BWT, c, sp-1)+1; ep : = C[c]+rank(BWT, c, ep); i--; end
発表の流れ l 定義 l Succinct Data Structure (SDS) l ビット列に対するSDS l 木構造に対するSDS l 文字列に対するSDS l SDSを用いた圧縮全文索引 l Suffix ArraysとBurrow Wheelers 変換 l FM-index, Compressed Suffix Arrays l まとめ・今後の目標
- Elaborate/succinct communication style
- What is a descriptive title
- Static data structure and dynamic data structure
- L文字
- Contoh record
- Elementary data organization in data structure
- Skular voc
- Cara skoring tes holland
- Sejarah tes epps
- Sds software development
- John l holland
- Holland tes
- Sd pantai indah cilincing
- Sulfuric acid sds
- Honeywell sds protocol
- Physical hazards
- Expressive means and stylistic devices
- Sulfuric acid sds
- Agarose gel electrophoresis vs sds page
- Msds
- Sds sample buffer 조성 역할
- Sds umbc
- Sds security
- Self-directed search manual
- Sds management delta
- What does sds stand for
- Hhps borders
- 3 ingredients for fire
- Sds meta skills
- Umn sds
- Whmis quiz answers 2015
- Section 8 of sds sheet
- Student disability services (sds) office
- Paiweb 2.0 paiweb.gov.co
- Calendrier sds unige
- Sds holland
- Sds page
- Papercut qr doc
- 2d sds page
- Polyacrylamide gel electrophoresis
- Dizilab4
- Whmis symbols
- Examples of mos and sds
- Sds 2 connection design
- Maud preher
- Covalent bond melting point
- Union myunion structure my structure
- Giant molecular structure vs simple molecular structure
- Structural ambiguity