CMU SCS Carnegie Mellon Univ Dept of Computer
- Slides: 128
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15 -415 - Database Applications Lecture#9: Indexing (R&G ch. 10) Faloutsos CMU SCS 15 -415
CMU SCS Outline • • • Motivation ISAM B-trees (not in book) B+ trees duplicates B+ trees in practice Faloutsos CMU SCS 15 -415 2
CMU SCS Introduction • How to support range searches? • equality searches? Faloutsos CMU SCS 15 -415 3
CMU SCS Range Searches • ``Find all students with gpa > 3. 0’’ • may be slow, even on sorted file • What to do? Page 1 Faloutsos Page 2 Page 3 CMU SCS 15 -415 Page N Data File 4
CMU SCS Range Searches • ``Find all students with gpa > 3. 0’’ • may be slow, even on sorted file • Solution: Create an `index’ file. Page 1 Faloutsos Page 2 Index File k. N k 1 k 2 Page 3 CMU SCS 15 -415 Page N Data File 5
CMU SCS Range Searches • More details: • if index file is small, do binary search there • Otherwise? ? Page 1 Faloutsos Page 2 Index File k. N k 1 k 2 Page 3 CMU SCS 15 -415 Page N Data File 6
CMU SCS ISAM • Repeat recursively! Non-leaf Pages Leaf Pages Faloutsos CMU SCS 15 -415 7
CMU SCS ISAM • OK - what if there are insertions and overflows? Non-leaf Pages Leaf Pages Faloutsos CMU SCS 15 -415 8
CMU SCS ISAM • Overflow pages, linked to the primary page Non-leaf Pages Leaf Pages Overflow page Faloutsos Primary pages CMU SCS 15 -415 9
CMU SCS Example ISAM Tree • 2 entries per page Root 40 10* 15* Faloutsos 20 33 20* 27* 51 33* 37* 40* CMU SCS 15 -415 46* 51* 63 55* 63* 97* 10
CMU SCS ISAM Details • format of an index page? • how full would a newly created ISAM be? Faloutsos CMU SCS 15 -415 11
CMU SCS ISAM Details • format of an index page? • how full would a newly created ISAM be? – ~80 -90% (not 100%) P 0 Faloutsos K 1 P 1 K 2 P 2 CMU SCS 15 -415 K m Pm 12
CMU SCS ISAM is a STATIC Structure • that is, index pages don’t change • File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then overflow pgs. Faloutsos CMU SCS 15 -415 13
CMU SCS ISAM is a STATIC Structure • Search: Start at root; use key comparisons to go to leaf. • Cost = log F N ; • F = # entries/pg (i. e. , fanout), • N = # leaf pgs Faloutsos CMU SCS 15 -415 14
CMU SCS ISAM is a STATIC Structure Insert: Find leaf that data entry belongs to, and put it there. Overflow page if necessary. Delete: Find and remove from leaf; if empty page, de-allocate. Faloutsos CMU SCS 15 -415 15
CMU SCS Example: Insert 23*, 48*, 41*, 42* Root 40 Index Pages 20 33 20* 27* 51 63 51* 55* Primary Leaf 10* 15* 33* 37* 40* 46* 48* 41* Pages Overflow 23* 63* 97* Pages 42* Faloutsos CMU SCS 15 -415 16
CMU SCS . . . then delete 42*, 51*, 97* Root 40 Index Pages 20 33 20* 27* 51 63 51* 55* Primary Leaf 10* 15* 33* 37* 40* 46* 48* 41* Pages Overflow 23* 63* 97* Pages 42* * Note that 51* appears in index levels, but not in leaf! Faloutsos CMU SCS 15 -415 17
CMU SCS ISAM ---- Issues? • Pros – ? ? • Cons – ? ? Faloutsos CMU SCS 15 -415 18
CMU SCS Outline • • • Motivation ISAM B-trees (not in book) B+ trees duplicates B+ trees in practice Faloutsos CMU SCS 15 -415 19
CMU SCS B-trees • the most successful family of index schemes (B-trees, B+-trees, B*-trees) • Can be used for primary/secondary, clustering/non-clustering index. • balanced “n-way” search trees Faloutsos CMU SCS 15 -415 20
CMU SCS B-trees [Rudolf Bayer and Mc. Creight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173189, 1972. ] Faloutsos CMU SCS 15 -415 21
CMU SCS B-trees Eg. , B-tree of order d=1: 6 <6 >6 1 Faloutsos 3 9 <9 7 >9 13 CMU SCS 15 -415 22
CMU SCS B - tree properties: • each node, in a B-tree of order d: – – – Key order at most n=2 d keys at least d keys (except root, which may have just 1 key) all leaves at the same level if number of pointers is k, then node has exactly k-1 keys – (leaves are empty) pn p 1 v 2 … vn-1 Faloutsos CMU SCS 15 -415 23
CMU SCS Properties • “block aware” nodes: each node -> disk page • O(log (N)) for everything! (ins/del/search) • typically, if d = 50 - 100, then 2 - 3 levels • utilization >= 50%, guaranteed; on average 69% Faloutsos CMU SCS 15 -415 24
CMU SCS Queries • Algo for exact match query? (eg. , ssn=8? ) 6 <6 >6 1 Faloutsos 3 9 <9 >9 7 13 CMU SCS 15 -415 25
CMU SCS JAVA animation! http: //slady. net/java/bt/ strongly recommended! Faloutsos CMU SCS 15 -415 26
CMU SCS Queries • Algo for exact match query? (eg. , ssn=8? ) 6 <6 >6 1 Faloutsos 3 9 <9 >9 7 13 CMU SCS 15 -415 27
CMU SCS Queries • Algo for exact match query? (eg. , ssn=8? ) 6 <6 >6 1 Faloutsos 3 9 <9 >9 7 13 CMU SCS 15 -415 28
CMU SCS Queries • Algo for exact match query? (eg. , ssn=8? ) 6 <6 >6 1 Faloutsos 3 9 <9 >9 7 13 CMU SCS 15 -415 29
CMU SCS Queries • Algo for exact match query? (eg. , ssn=8? ) 6 <6 >6 1 Faloutsos 3 9 <9 H steps (= disk accesses) >9 7 13 CMU SCS 15 -415 30
CMU SCS Queries • what about range queries? (eg. , 5<salary<8) • Proximity/ nearest neighbor searches? (eg. , salary ~ 8 ) Faloutsos CMU SCS 15 -415 31
CMU SCS Queries • what about range queries? (eg. , 5<salary<8) • Proximity/ nearest neighbor searches? (eg. , salary ~ 8 ) 6 <6 >6 1 Faloutsos 3 9 <9 7 CMU SCS 15 -415 >9 13 32
CMU SCS Queries • what about range queries? (eg. , 5<salary<8) • Proximity/ nearest neighbor searches? (eg. , salary ~ 8 ) 6 <6 >6 1 Faloutsos 3 9 <9 7 CMU SCS 15 -415 >9 13 33
CMU SCS Queries • what about range queries? (eg. , 5<salary<8) • Proximity/ nearest neighbor searches? (eg. , salary ~ 8 ) 6 <6 >6 1 Faloutsos 3 9 <9 7 CMU SCS 15 -415 >9 13 34
CMU SCS Queries • what about range queries? (eg. , 5<salary<8) • Proximity/ nearest neighbor searches? (eg. , salary ~ 8 ) 6 <6 >6 1 Faloutsos 3 9 <9 7 CMU SCS 15 -415 >9 13 35
CMU SCS B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively) • split: preserves B - tree properties Faloutsos CMU SCS 15 -415 36
CMU SCS B-trees Easy case: Tree T 0; insert ‘ 8’ 6 <6 >6 1 Faloutsos 3 9 <9 7 >9 13 CMU SCS 15 -415 37
CMU SCS B-trees Tree T 0; insert ‘ 8’ 6 <6 >6 1 Faloutsos 3 9 >9 <9 7 8 CMU SCS 15 -415 13 38
CMU SCS B-trees Hardest case: Tree T 0; insert ‘ 2’ 6 <6 >6 1 3 9 <9 7 >9 13 2 Faloutsos CMU SCS 15 -415 39
CMU SCS B-trees Hardest case: Tree T 0; insert ‘ 2’ 6 1 2 3 9 7 13 push middle up Faloutsos CMU SCS 15 -415 40
CMU SCS B-trees Hardest case: Tree T 0; insert ‘ 2’ Ovf; push middle 1 Faloutsos 22 6 7 3 CMU SCS 15 -415 9 13 41
CMU SCS B-trees Hardest case: Tree T 0; insert ‘ 2’ 6 Final state 9 2 1 Faloutsos 7 3 CMU SCS 15 -415 13 42
CMU SCS B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’) • split: preserves all B - tree properties (!!) • notice how it grows: height increases when root overflows & splits • Automatic, incremental re-organization (contrast with ISAM!) Faloutsos CMU SCS 15 -415 43
CMU SCS Pseudo-code INSERTION OF KEY ’K’ find the correct leaf node ’L’; if ( ’L’ overflows ){ split ’L’, and push middle key to parent node ’P’; if (’P’ overflows){ repeat the split recursively; } else{ add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */ } Faloutsos CMU SCS 15 -415 44
CMU SCS Overview • . . . • B – trees – Dfn, Search, insertion, deletion • . . . Faloutsos CMU SCS 15 -415 45
CMU SCS Deletion Rough outline of algo: • Delete key; • on underflow, may need to merge In practice, some implementors just allow underflows to happen… Faloutsos CMU SCS 15 -415 46
CMU SCS B-trees – Deletion Easiest case: Tree T 0; delete ‘ 3’ 6 <6 >6 1 Faloutsos 3 9 <9 7 >9 13 CMU SCS 15 -415 47
CMU SCS B-trees – Deletion Easiest case: Tree T 0; delete ‘ 3’ 6 <6 >6 1 Faloutsos 9 <9 7 >9 13 CMU SCS 15 -415 48
CMU SCS B-trees – Deletion • Case 1: delete a key at a leaf – no underflow • Case 2: delete non-leaf key – no underflow • Case 3: delete leaf-key; underflow, and ‘rich sibling’ • Case 4: delete leaf-key; underflow, and ‘poor sibling’ Faloutsos CMU SCS 15 -415 49
CMU SCS B-trees – Deletion • Case 1: delete a key at a leaf – no underflow (delete 3 from T 0) 6 <6 >6 1 Faloutsos 3 9 <9 7 CMU SCS 15 -415 >9 13 50
CMU SCS B-trees – Deletion • Case 2: delete a key at a non-leaf – no underflow (eg. , delete 6 from T 0) 6 <6 >6 1 Faloutsos 3 Delete & promote, ie: 9 <9 7 CMU SCS 15 -415 >9 13 51
CMU SCS B-trees – Deletion • Case 2: delete a key at a non-leaf – no underflow (eg. , delete 6 from T 0) 9 <6 >6 1 Faloutsos 3 Delete & promote, ie: <9 7 CMU SCS 15 -415 >9 13 52
CMU SCS B-trees – Deletion • Case 2: delete a key at a non-leaf – no underflow (eg. , delete 6 from T 0) 3 <6 >6 1 Faloutsos Delete & promote, ie: 9 <9 7 CMU SCS 15 -415 >9 13 53
CMU SCS B-trees – Deletion • Case 2: delete a key at a non-leaf – no underflow (eg. , delete 6 from T 0) FINAL TREE <3 3 >3 1 Faloutsos 9 <9 7 CMU SCS 15 -415 >9 13 54
CMU SCS B-trees – Deletion • Case 2: delete a key at a non-leaf – no underflow (eg. , delete 6 from T 0) • Q: How to promote? • A: pick the largest key from the left sub-tree (or the smallest from the right sub-tree) • Observation: every deletion eventually becomes a deletion of a leaf key Faloutsos CMU SCS 15 -415 55
CMU SCS B-trees – Deletion • Case 1: delete a key at a leaf – no underflow • Case 2: delete non-leaf key – no underflow • Case 3: delete leaf-key; underflow, and ‘rich sibling’ • Case 4: delete leaf-key; underflow, and ‘poor sibling’ Faloutsos CMU SCS 15 -415 56
CMU SCS B-trees – Deletion • Case 3: underflow & ‘rich sibling’ (eg. , delete 7 from T 0) 6 <6 >6 1 Faloutsos 3 Delete & borrow, ie: 9 <9 7 CMU SCS 15 -415 >9 13 57
CMU SCS B-trees – Deletion • Case 3: underflow & ‘rich sibling’ (eg. , delete 7 from T 0) Rich sibling 1 Faloutsos 6 <6 >6 Delete & borrow, ie: 9 <9 3 >9 13 CMU SCS 15 -415 58
CMU SCS B-trees – Deletion • Case 3: underflow & ‘rich sibling’ • ‘rich’ = can give a key, without underflowing • ‘borrowing’ a key: THROUGH the PARENT! Faloutsos CMU SCS 15 -415 59
CMU SCS B-trees – Deletion • Case 3: underflow & ‘rich sibling’ (eg. , delete 7 from T 0) Rich sibling 1 6 <6 >6 Delete & borrow, ie: 9 <9 3 >9 13 NO!! Faloutsos CMU SCS 15 -415 60
CMU SCS B-trees – Deletion • Case 3: underflow & ‘rich sibling’ (eg. , delete 7 from T 0) 6 <6 >6 1 Faloutsos Delete & borrow, ie: 9 <9 3 >9 13 CMU SCS 15 -415 61
CMU SCS B-trees – Deletion • Case 3: underflow & ‘rich sibling’ (eg. , delete 7 from T 0) 3 <6 >6 1 Faloutsos Delete & borrow, ie: 9 <9 6 CMU SCS 15 -415 >9 13 62
CMU SCS B-trees – Deletion • Case 3: underflow & ‘rich sibling’ (eg. , delete 7 from T 0) FINAL TREE 3 <3 >3 1 Faloutsos Delete & borrow, through the parent 9 <9 6 CMU SCS 15 -415 >9 13 63
CMU SCS B-trees – Deletion • Case 1: delete a key at a leaf – no underflow • Case 2: delete non-leaf key – no underflow • Case 3: delete leaf-key; underflow, and ‘rich sibling’ • Case 4: delete leaf-key; underflow, and ‘poor sibling’ Faloutsos CMU SCS 15 -415 64
CMU SCS B-trees – Deletion • Case 4: underflow & ‘poor sibling’ (eg. , delete 13 from T 0) 6 <6 >6 1 Faloutsos 3 9 <9 7 CMU SCS 15 -415 >9 13 65
CMU SCS B-trees – Deletion • Case 4: underflow & ‘poor sibling’ (eg. , delete 13 from T 0) 6 <6 >6 1 Faloutsos 3 9 <9 >9 7 CMU SCS 15 -415 66
CMU SCS B-trees – Deletion • Case 4: underflow & ‘poor sibling’ (eg. , delete 13 from T 0) 6 <6 >6 1 Faloutsos 3 9 <9 >9 A: merge w/ ‘poor’ sibling 7 CMU SCS 15 -415 67
CMU SCS B-trees – Deletion • Case 4: underflow & ‘poor sibling’ (eg. , delete 13 from T 0) • Merge, by pulling a key from the parent • exact reversal from insertion: ‘split and push up’, vs. ‘merge and pull down’ • Ie. : Faloutsos CMU SCS 15 -415 68
CMU SCS B-trees – Deletion • Case 4: underflow & ‘poor sibling’ (eg. , delete 13 from T 0) <6 A: merge w/ ‘poor’ sibling 6 >6 1 Faloutsos 3 7 CMU SCS 15 -415 9 69
CMU SCS B-trees – Deletion • Case 4: underflow & ‘poor sibling’ (eg. , delete 13 from T 0) FINAL TREE <6 6 >6 1 Faloutsos 3 7 CMU SCS 15 -415 9 70
CMU SCS B-trees – Deletion • • Case 4: underflow & ‘poor sibling’ -> ‘pull key from parent, and merge’ Q: What if the parent underflows? A: repeat recursively Faloutsos CMU SCS 15 -415 71
CMU SCS B-tree deletion - pseudocode DELETION OF KEY ’K’ locate key ’K’, in node ’N’ if( ’N’ is a non-leaf node) { delete ’K’ from ’N’; find the immediately largest key ’K 1’; /* which is guaranteed to be on a leaf node ’L’ */ copy ’K 1’ in the old position of ’K’; invoke this DELETION routine on ’K 1’ from the leaf node ’L’; else { /* ’N’ is a leaf node */. . . (next slide. . ) Faloutsos CMU SCS 15 -415 72
CMU SCS B-tree deletion - pseudocode /* ’N’ is a leaf node */ if( ’N’ underflows ){ let ’N 1’ be the sibling of ’N’; if( ’N 1’ is "rich"){ /* ie. , N 1 can lend us a key */ borrow a key from ’N 1’ THROUGH the parent node; }else{ /* N 1 is 1 key away from underflowing */ MERGE: pull the key from the parent ’P’, and merge it with the keys of ’N’ and ’N 1’ into a new node; if( ’P’ underflows){ repeat recursively } } } Faloutsos CMU SCS 15 -415 73
CMU SCS Outline • Motivation • ISAM • B-trees (not in book) – algorithms – extensions • B+ trees • duplicates • B+ trees in practice Faloutsos CMU SCS 15 -415 74
CMU SCS Variations • How could we do even better than the Btrees above? Faloutsos CMU SCS 15 -415 75
CMU SCS B*-tree • In B-trees, worst case util. = 50%, if we have just split all the pages • how to increase the utilization of B - trees? • . . with B* - trees! Faloutsos CMU SCS 15 -415 76
CMU SCS B-trees and B*-trees Eg. , Tree T 0; insert ‘ 2’ 6 <6 >6 1 3 9 <9 7 >9 13 2 Faloutsos CMU SCS 15 -415 77
CMU SCS B*-trees: deferred split! • Instead of splitting, LEND keys to sibling! (through PARENT, of course!) 6 <6 >6 1 3 9 <9 7 >9 13 2 Faloutsos CMU SCS 15 -415 78
CMU SCS B*-trees: deferred split! • Instead of splitting, LEND keys to sibling! (through PARENT, of course!) 3 <3 >3 1 2 9 >9 <9 6 FINAL TREE 7 13 2 Faloutsos CMU SCS 15 -415 79
CMU SCS B*-trees: deferred split! • Notice: shorter, more packed, faster tree • It’s a rare case, where space utilization and speed improve together • BUT: What if the sibling has no room for our ‘lending’? Faloutsos CMU SCS 15 -415 80
CMU SCS B*-trees: deferred split! • A: 2 -to-3 split: get the keys from the sibling, pool them with ours (and a key from the parent), and split in 3. • Could we extend the idea to 3 -to-4 split, 4 to-5 etc? Faloutsos CMU SCS 15 -415 81
CMU SCS B*-trees: deferred split! • A: 2 -to-3 split: get the keys from the sibling, pool them with ours (and a key from the parent), and split in 3. • Could we extend the idea to 3 -to-4 split, 4 to-5 etc? • Yes, but: diminishing returns Faloutsos CMU SCS 15 -415 82
CMU SCS Outline • • • Motivation ISAM B-trees (not in book) B+ trees duplicates B+ trees in practice Faloutsos CMU SCS 15 -415 83
CMU SCS B+ trees - Motivation B-tree – print keys in sorted order: 6 <6 >6 1 Faloutsos 3 9 <9 7 >9 13 CMU SCS 15 -415 84
CMU SCS B+ trees - Motivation B-tree needs back-tracking – how to avoid it? 6 <6 >6 1 Faloutsos 3 9 <9 7 >9 13 CMU SCS 15 -415 85
CMU SCS B+ trees - Motivation Stronger reason: for clustering index, data records are scattered: 6 <6 >6 1 Faloutsos 3 9 <9 7 >9 13 CMU SCS 15 -415 86
CMU SCS Solution: + B - trees • facilitate sequential ops • They string all leaf nodes together • AND • replicate keys from non-leaf nodes, to make sure every key appears at the leaf level • (vital, for clustering index!) Faloutsos CMU SCS 15 -415 87
CMU SCS B+ trees 6 <6 >=6 1 Faloutsos 3 9 >=9 <9 6 7 CMU SCS 15 -415 9 13 88
CMU SCS B+ trees 6 <6 >=6 1 Faloutsos 3 9 >=9 <9 6 7 CMU SCS 15 -415 9 13 89
CMU SCS B+trees • More details: next (and textbook) • In short: on split – at leaf level: COPY middle key upstairs – at non-leaf level: push middle key upstairs (as in plain B-tree) Faloutsos CMU SCS 15 -415 90
CMU SCS Example B+ Tree • Search begins at root, and key comparisons direct it to a leaf (as in ISAM). • Search for 5*, 15*, all data entries >= 24*. . . Root 13 2* 3* 5* 7* 14* 16* 17 24 30 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Based on the search for 15*, we know it is not in the tree! Faloutsos CMU SCS 15 -415 91
CMU SCS B+ Trees in Practice • Typical order: 100. Typical fill-factor: 67%. – average fanout = 2*100*0. 67 = 134 • Typical capacities: – Height 4: 1334 = 312, 900, 721 entries – Height 3: 1333 = 2, 406, 104 entries Faloutsos CMU SCS 15 -415 92
CMU SCS B+ Trees in Practice • Can often keep top levels in buffer pool: – Level 1 = 1 page = 8 KB – Level 2 = 134 pages = 1 MB – Level 3 = 17, 956 pages = 140 MB Faloutsos CMU SCS 15 -415 93
CMU SCS Inserting a Data Entry into a B+ Tree • Find correct leaf L. • Put data entry onto L. – If L has enough space, done! – Else, must split L (into L and a new node L 2) • Redistribute entries evenly, copy up middle key. • parent node may overflow – but then: push up middle key. Splits “grow” tree; root split increases height. Faloutsos CMU SCS 15 -415 94
CMU SCS Example B+ Tree - Inserting 8* Root 13 2* 3* 5* Faloutsos 7* 14* 16* 17 24 19* 20* 22* 23* CMU SCS 15 -415 24* 27* 29* 95
CMU SCS Example B+ Tree - Inserting 8* Root 13 2* 3* 5* 7* Faloutsos 13 17 24* 27* 29* 24 >=5 <5 3* 24 19* 20* 22* 23* 14* 16* 5* 2* 17 5* 7* 8* 14* 16* CMU SCS 15 -415 19* 20* 22* 23* 24* 27* 29* 96
CMU SCS Example B+ Tree - Inserting 21* Root 5 13 17 24 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* Faloutsos CMU SCS 15 -415 24* 27* 29* 97
CMU SCS Example B+ Tree - Inserting 21* Root 5 13 17 24 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 23* 2* 3* 5* 7* 8* 14* 16* 19* 20* Faloutsos CMU SCS 15 -415 24* 27* 29* 21* 22* 23* 24* 27* 29* 98
CMU SCS Example B+ Tree Root 17 5 2* 3* 21 13 5* 7* 8* 14* 16* 19* 20* 24 21* 22* 23* 24* 27* 29* • Notice that root was split, increasing height. • Could use defer-split here. (Pros/Cons? ) Faloutsos CMU SCS 15 -415 99
CMU SCS Example: Data vs. Index Page Split Data Page Split • leaf: ‘copy’ • non-leaf: ‘push’ 2* • why not ‘copy’ @ non-leaves? Index Page Split 5 Faloutsos 3* 13 2* 7* 5* 7* 8* … 5 5* 3* 8* 5 13 17 21 24 17 21 CMU SCS 15 -415 24 100
CMU SCS Now you try… Root 30 5 2* 3* 13 5* … (not shown) 20 7* 8* 11* 14* 16* 21* 22* 23* Insert the following data entries (in order): 28*, 6*, 25* Faloutsos CMU SCS 15 -415 101
CMU SCS Answer… After inserting 28*, 6* 30 5 2* 3* 7 5* 6* 13 … 20 7* 8* 11* 14* 16* 21* 22* 23* 28* After inserting 25* Faloutsos CMU SCS 15 -415 102
CMU SCS Answer… After inserting 25* 13 5 2* 3* Faloutsos 30 20 7 5* 6* 7* 8* 11* 14* 16* CMU SCS 15 -415 … 23 21* 22* 23* 25* 28* 103
CMU SCS Deleting a Data Entry from a B+ Tree • Start at root, find leaf L where entry belongs. • Remove the entry. – If L is at least half-full, done! – If L underflows • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). • If re-distribution fails, merge L and sibling. – update parent – and possibly merge, recursively Faloutsos CMU SCS 15 -415 104
CMU SCS Example: Delete 19* & 20* Root 17 12 5 2* 3* 13 5* 7* 8* Deleting 19* is easy: 14* 16* Root 24 19* 22* 20* 22* 30 24* 27* 29* 33* 34* 38* 39* 17 3 5 2* 3* 27 13 5* 7* 8* 14* 16* 22* 24* 30 27* 29* • Deleting 20* -> re-distribution (notice: Faloutsos 27 copied up) CMU SCS 15 -415 33* 34* 38* 39* 105
CMU SCS . . . And Then Deleting 24* Root 3 5 2* 3* 27 13 5* 7* 8* 14* 16* Root 4 5 2* 3* 17 22* 24* 7* 8* 27* 29* 30 14* 16* 22* 27* 29* SCS 15 -415 • Must merge leaves. CMU… but are we done? ? Faloutsos 33* 34* 38* 39* 17 13 5* 30 33* 34* 38* 39* 106
CMU SCS . . . Merge Non-Leaf Nodes, Shrink Tree Root 4 5 2* 3* 30 13 5* 7* 8* 22* 27* 29* 14* 16* 33* 34* 38* 39* Root 5 2* 17 5 3* Faloutsos 5* 7* 8* 13 17 30 14* 16* CMU SCS 15 -415 22* 27* 29* 33* 34* 38* 39* 107
CMU SCS Example of Non-leaf Redistribution • Tree is shown below during deletion of 24*. • Now, we can re-distribute keys Root 22 5 2* 3* 5* 7* 8* Faloutsos 13 14* 16* 17 30 20 17* 18* CMU SCS 15 -415 20* 21* 22* 27* 29* 33* 34* 38* 39* 108
CMU SCS After Re-distribution • need only re-distribute ‘ 20’; did ‘ 17’, too • why would we want to re-distributed more keys? Root 17 5 2* 3* Faloutsos 5* 7* 8* 13 14* 16* 20 17* 18* CMU SCS 15 -415 20* 21* 22 30 22* 27* 29* 33* 34* 38* 39* 109
CMU SCS Main observations for deletion • If a key value appears twice (leaf + nonleaf), the above algorithms delete it from the leaf, only • why not non-leaf, too? Faloutsos CMU SCS 15 -415 110
CMU SCS Main observations for deletion • If a key value appears twice (leaf + nonleaf), the above algorithms delete it from the leaf, only • why not non-leaf, too? • ‘lazy deletions’ - in fact, some vendors just mark entries as deleted (~ underflow), – and reorganize/compact later Faloutsos CMU SCS 15 -415 111
CMU SCS Recap: main ideas • on overflow, split (and ‘push’, or ‘copy’) – or consider deferred split • on underflow, borrow keys; or merge – or let it underflow. . . Faloutsos CMU SCS 15 -415 112
CMU SCS Outline • • • Motivation ISAM B-trees (not in book) B+ trees duplicates B+ trees in practice – prefix compression; bulk-loading; ‘order’ Faloutsos CMU SCS 15 -415 113
CMU SCS B+ trees with duplicates • Everything so far: assumed unique key values • How to extend B+-trees for duplicates? – Alt. 2: <key, rid> – Alt. 3: <key, {rid list}> • 2 approaches, roughly equivalent Faloutsos CMU SCS 15 -415 114
CMU SCS B+ trees with duplicates • approach#1: repeat the key values, and extend B+ tree algo’s appropriately - eg. many ‘ 14’s 13 2* 3* 5* Faloutsos 7* 13* 14* 14 24 14* 22* 23* CMU SCS 15 -415 24* 27* 29* 115
CMU SCS B+ trees with duplicates • approach#1: subtle problem with deletion: • treat rid as part of the key, thus making it unique 13 2* 3* 5* Faloutsos 7* 13* 14* 14 24 14* 22* 23* CMU SCS 15 -415 24* 27* 29* 116
CMU SCS B+ trees with duplicates • approach#2: store each key value: once • but store the {rid list} as variable-length field (and use overflow pages, if needed) 13 2* 3* 5* Faloutsos 7* 13* 14* {rid list} {rid list, cont’d} 14 24 22* 23* CMU SCS 15 -415 24* 27* 29* 117
CMU SCS Outline • • • Motivation ISAM B-trees (not in book) B+ trees duplicates B+ trees in practice – prefix compression; bulk-loading; ‘order’ Faloutsos CMU SCS 15 -415 118
CMU SCS Prefix Key Compression • Important to increase fan-out. (Why? ) • Key values in index entries only `direct traffic’; can often compress them. Papadopoulos Faloutsos Pernikovskaya CMU SCS 15 -415 119
CMU SCS Prefix Key Compression • Important to increase fan-out. (Why? ) • Key values in index entries only `direct traffic’; can often compress them. Pap Faloutsos Per <room for more separators/keys> CMU SCS 15 -415 120
CMU SCS Bulk Loading of a B+ Tree • In an empty tree, insert many keys • Why not one-at-a-time? Faloutsos CMU SCS 15 -415 121
CMU SCS Bulk Loading of a B+ Tree • Initialization: Sort all data entries • scan list; whenever enough for a page, pack • <repeat for upper level - even faster than book’s algo> Root 3* 4* Faloutsos Sorted pages of data entries; not yet in B+ tree 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* CMU SCS 15 -415 38* 41* 44* 122
CMU SCS Bulk Loading (Contd. ) Root • Book’s algo • (any problems? ) 10 12 6 3* 4* 6* 9* 20 23 20 10 Faloutsos 3* 4* 12 CMU SCS 15 -415 6* 9* not yet in B+ tree 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44* Root 6 Data entry pages 35 Data entry pages not yet in B+ tree 35 23 38 123 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*
CMU SCS Outline • • • Motivation ISAM B-trees (not in book) B+ trees duplicates B+ trees in practice – prefix compression; bulk-loading; ‘order’ Faloutsos CMU SCS 15 -415 124
CMU SCS A Note on `Order’ • Order (d) concept replaced by physical space criterion in practice (`at least half-full’). • Why do we need it? – Index pages can typically hold many more entries than leaf pages. – Variable sized records and search keys mean different nodes will contain different numbers of entries. – Even with fixed length fields, multiple records with the same search key value (duplicates) can lead to variable-sized data entries (if we use Alternative (3)). Faloutsos CMU SCS 15 -415 125
CMU SCS A Note on `Order’ • Many real systems are even sloppier than this: they allow underflow, and only reclaim space when a page is completely empty. • (what are the benefits of such ‘slopiness’? ) Faloutsos CMU SCS 15 -415 126
CMU SCS Conclusions • B+tree is the prevailing indexing method • Excellent, O(log. N) worst-case performance for ins/del/search; (~3 -4 disk accesses in practice) • guaranteed 50% space utilization; avg 69% Faloutsos CMU SCS 15 -415 127
CMU SCS Conclusions • Can be used for any type of index: primary/secondary, sparse (clustering), or dense (non-clustering) • Several fine-extensions on the basic algorithm – deferred split; prefix compression; (underflows) – bulk-loading – duplicate handling Faloutsos CMU SCS 15 -415 128
- Cmu comp bio
- Carnegie mellon interdisciplinary
- Carnegie mellon software architecture
- Cmu bomb threat
- Carnegie mellon software architecture
- Citi training cmu
- Mism carnegie mellon
- Randy pausch carnegie mellon
- National robotics initiative
- Iit
- 18-213 cmu
- Cmu vpn
- Carnegie mellon
- Carnegie mellon
- Carnegie mellon
- Carnegie mellon
- Carnegie mellon fat letter
- 15-513 cmu
- Cmu bomb lab
- Mellon serbia iskustva
- Carneigh mellon
- Conclusion of breastfeeding
- Bny mellon health savings account
- Mellon tubes
- Water mellon
- Mellon elf
- Mellon elf
- Mellon elf
- Scs desco
- Scs method
- Lluvia neta
- Tabel panjang lengkung peralihan
- Antecedent moisture condition
- Simbol komponen diac
- Scs curve number
- Curva tiristor
- Wiki.scs
- Scs.ryerson.ca harley
- Rangkaian fet
- Scs reasonable person principle
- Scs thyristor
- Scs carleton
- Scs archiver
- Jenis lengkung
- Scs elogs
- Scs lulu
- Scs methode
- Doc scs
- Carotenoid score
- 15-441
- Homorogeneous
- Synchronization in computer architecture
- Ut arlington va office
- Univ 3.constantine medecine
- Ch rahmoune
- Sug grant
- Fs boumerdes
- Celcat univ nantes
- Fs.univ.umbb
- Univ constantine 3
- Pharmacie.univ-batna2
- Celcat univ nantes
- Univ el bayadh
- Prodoc univ nantes
- Univ prof titel
- Moodle univtln
- Mail univ ouargla
- Sfa univ poitiers
- (univ. caxias do sul) escolha a alternativa que completa
- Lon capa ohio univ
- Iut valenciennes ent
- Ent université tours
- Snv constantine
- Kalkulatorische zinsen
- Was andrew carnegie a hero
- The rise
- Was andrew carnegie bad
- Modelo carnegie
- Dept nmr spectroscopy
- Fl dept of agriculture
- Finance dept structure
- Worcester ma building department
- Andrew carnegie vertical integration
- Andrew carnegie vertical integration
- Dept. name of organization
- Philanthropy carnegie
- Mn dept of education
- Liz welch mississippi
- Carnegie vertical integration
- Dept. name of organization (of affiliation)
- Andrew carnegie and john d. rockefeller venn diagram
- Dale carnegie conversation stack
- Ohio dept of dd
- Affiliation poster presentation
- Carnegie learning
- Acadia computer science
- Vaginal dept
- Gome dept
- Gome dept
- Nyttofunktion
- Gome dept
- Hoe dept
- Firefighter oral interview questions
- Oviposition
- Dept of education
- Florida dept of agriculture and consumer services
- Florida dept of agriculture and consumer services
- Dept a
- Central islip fire department
- Carnegie
- Jack carnegie
- Carnegie
- Bill gates founder of microsoft
- Micah ennis
- Dept of education
- Dept c13 nmr
- Pt dept logistik
- Nys dept of homeland security
- Carnegie robotics llc
- La dept of revenue
- La revenue dept
- Continuing education library oxford
- Jp morgan vertical integration
- Nebraska dept of agriculture
- Iit
- Dept ind onegov
- Albany county dss
- Cmu 16-385
- Kesden cmu