BTrees But first a little note about data
B-Trees
But first, a little note about data structures �Not all data structures work well as file structures �Example: Binary Search Tree Knight Gibson Coleman Sanders Hudson Monroe
Motivation for B-Trees �Index too large for memory �search time better than binary search �not just fast search, but also fast delete and fast insert �What's the "B" stand for? ◦ Bayer and Mc. Creight ◦ Boeing ◦ balanced, bushy, broad
Data File Level Two Indexed Files �Searching ◦ Dannelly Level One �Deleting ◦ Duncan �Adding ◦ Walters ◦ Aardvark Key IRRN 0 Adams 0 1 Davis 10 Name Key DRRN 0 Adams 6 0 Carey 1 Barnes 2 1 Foster 2 Bell 18 2 Barnes 3 Bishop 8 3 Zinn 4 Camp 80 4 Critter 5 Carey 0 5 Faulk 6 Conner 19 6 Adams 7 Critter 4 7 Wilks 8 Crook 99 8 Bishop 9 Dannelly 21 9 Farrow 10 Davis 20 10 Duncan 11 Dinkins 11 11 Dinkins 12 Duncan 10 12 West . . . . 18 Faulk 5 18 Bell 19 Farrow 9 19 Conner 20 Foster 1 20 Davis 21 Fuller 98 21 Dannelly . . . . 80 West 12 80 Camp 2 Foster 20 3 Ingram 30 4 Lambert 40 5 Norris 50 6 Randall 60 7 Tyler 70 81 Wilks 7 81 Zane 8 West 80 . . . . 98 Zane 81 98 Fuller Young 90 99 Zinn 3 99 Crook 9 Yadda yadda
B-Tree Informal Definition �multi-level �nodes indexes are indexes, indexes are nodes �"Order" - maximum references in a node, minimum references is ½ the order �When node fills, split it and move up largest key �When node is too empty, combine it with parent
Example Insertion �Insert these letters into an 4 -order B-Tree �C S D T A M P I B W N G U R �After C S D and T C �After A �M D S A, node Dsplits. T and largest keys move up C D S C T and P are added to right node, but so is I D A T D I P M T P S T
�C SDTAMPIBWNGUR �B, W, and N are no problem D A B C D �Insertion I B C D M D G �Inserting A B C M N P I M S P M W N W P S P T W P T D N G T U R causes right node to split, then root to split D W of G causes another split, then U is no problem A P I M W P U R S T W W
Analysis �Order Size? ◦ match to disk cluster size and memory �Number of file accesses to Search ◦ depth of tree ◦ so bigger the Order the better �Order = 100 and Levels = 4 == 100 million records �Number of file accesses to Delete 1. search downward to the leaf 2. modify node 3. if it was largest, adjust parent node
Analysis �Number of file accesses to Add Øbest case 1. search downward 2. adjust the node Øworst case (split) 1. search downward to the leaf 2. insert, overflow detect, split upward 3. create new root node
Definition of B-Tree In general, a B-Tree of Order N has the following properties: 1. the root has at least two descendants, or is a leaf 2. each node has no mode that N descendents 3. each node that is not the root or a leaf has at least N/2 descendants 4. all leaf nodes are at the same level 5. a nonleaf node with k descendants contains k-1 key values
B+ Trees �Since search time = depth of tree, we need to keep the tree short and wide �Uneven tree (some full nodes and some near empty nodes, or leans to one side) creates poor performance �Using a slightly smarter split method during add keeps the tree short and balanced �B+ Tree is the de facto standard for databases
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Storing a B-Tree in Files �Data File ◦ order does not matter �Index File ◦ lists of indexes ◦ each index: key, RRN When inserting a new node, does its placement in the index file matter? D A B C M P W P T D N G I M IRN 4 8 U S T RRN 12 16 20 24 28 4 8 0 2 11 7 5 10 6 13 1 3 W P R Key P W --D M P -T W --A B C D G I M -N P --R S T -- W
Next Class Review of the entire semester v Sorting and Binary Searching v FAT v. NTFS v. Linux v File Access Times v Fragmented Files v which is best storage method ◦ Indexed ◦ B-Tree ◦ Hashed
- Slides: 13