Btrees Eduardo Laber David Sotelo What are Btrees
B-trees Eduardo Laber David Sotelo
What are B-trees? • Balanced search trees designed for secondary storage devices • Similar to AVL-trees but better at minimizing disk I/O operations • Main data structure used by DBMS to store and retrieve information
What are B-trees? • Nodes may have many children (from a few to thousands) • Branching factor can be quite large • Every B-tree of n keys has height O(log n) • In practice, its height is smaller than the height of an AVL-Tree
B-trees and Branching Factor
Definition of B-trees B-tree is a rooted tree containing the following five properties: 1. Every node x has the following attributes: a) x. n, the number of keys stored in node x b) The x. n keys: x. key 1 ≤ x. key 2 ≤. . . ≤ x. keyx. n c) The boolen x. leaf indicating if x is a leaf or an internal node
Definition of B-trees 2. If x is an internal node it contains x. n + 1 pointers x. p 1 , x. p 2 , . . . , x. p(x. n + 1) to its children 3. The keys x. keyi separate ranges of trees stored in each subtree (x. pi , x. pi+1 ) 4. All leaves have the same depth == tree’s height.
Definition of B-trees 5. Bounds on the number of keys of a node: • Let B be a positive integer representing the order of the B-tree. • Every node (except the root) must have at least B keys. • Every node (except the root) must have at most 2 B keys. • Root is free to contain between 1 and 2 B nodes (why? )
Example of a B-tree
Exercise 1 Enumerate all valid B-trees of order 2 that represent the set {1, 2, . . . , 8}
Exercise 1 Solution: 4 1 2 3 5 5 6 7 8 1 2 3 4 3 6 1 2 4 5 7 8 6 7 8
The height of a B-tree Theorem : Let h be the height of a B-tree of n keys and order B > 1. Then: h ≤ log B (n+1)/2 Proof: • • • Root contains at least one key. All other nodes contain at least B keys At least one key at depth 0 At least 2 B keys at depth 1 At least 2 B 2 + B keys at depth 2 At least 2 Bi + Bi-1 + Bi-2 +. . . + B keys at depth i
Proof (continued) ■
Searching a B-tree • Similar to searching a binary search tree. • Multiway branching decision according to the number of the node’s chidren. • Recursive procedure with a time complexity of O(B log B n) for a tree of n keys and order B.
Searching a B-tree B-TREE-SEARCH (x, k) 1 i=1 2 while i ≤ x. n and k > x. keyi do i = i + 1 3 if i ≤ x. n and k == x. keyi then return (x, i) 4 if x. leaf then return NIL 5 else DISK-READ(x. pi) return B-TREE-SEARCH (x. pi , k)
Searching a B-tree • Search for the key F J K P S C G A B D E F H I L M N O Q R T U
Searching a B-tree • Search for the key F J K P S C G A B D E F H I L M N O Q R T U
Searching a B-tree • Search for the key F J K P S C G A B D E F H I L M N O Q R T U
Searching a B-tree • Search for the key N J K P S C G A B D E F H I L M N O Q R T U
Searching a B-tree Lemma: The time complexity of procedure B-TREE-SEARCH is O(B log B n) Proof: • Number of recursive calls is equal to tree’s height. • The height of a B-tree is O(log B n) • Cost between B and 2 B iterations per call. • Total of O(B log B n) steps. ■
Exercise 2 • Suppose that B-TREE-SEARCH is implemented to use binary search rather than linear search within each node. • Show that this changes makes the time complexity O(lg n), independently of how B might be chosen as a function of n.
Exercise 2 Solution: • By using binary search the number of steps of the algorithm becomes O(lg B log B n). • Observe that log B n = lg n / lg B. • Therefore O(lg B log B n) = O(lg n).
Linear or Binary B-tree search ? Lemma: If 1 < B < n then lg n ≤ B log. B n Proof:
Inserting a key into a B-tree • The new key is always inserted into an existing leaf node (why? ) • Firstly we search for the leaf position at which to insert the new key. • If such a node is full we split it. • A split operation splits a full node around its median key into two nodes having B keys each. • Median key moves up into splitted node’s parent (insertion recursive call).
Split operation • Inserting key F into a full node (B = 2) J A C E G K M O Q
Split operation • Node found but already full J A C E FG K M O Q
Split operation • Median key identified J A C E FG K M O Q
Split operation • Splitting the node E J A C F G K M O Q
Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) E J T X A C F G K M O Q UW Y Z
Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) E J T X A C F G K MNO Q UW Y Z
Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) E J N T X A C F G K M O Q SPLIT UW Y Z
Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) N E J A C F G SPLIT K M T X O Q U W Y Z
Inserting a key into a B-tree B-TREE-INSERT (x, k, y) 1 i=1 2 while i ≤ x. n and k < x. keyi do i = i + 1 3 x. n = x. n + 1 4 x. keyi = k 5 x. pi+1 = y 6 for j = x. n downto i+1 do 7 x. keyj = x. keyj-1 8 x. pj = x. pj-1 9 end-for 10 DISK-WRITE(x)
Inserting a key into a B-tree B-TREE-INSERT (x, k) 11 if x. n > 2*B then 12 [m, z] = SPLIT (x) 13 if x. parent != NIL then 14 DISK-READ (x. parent) 15 end-if 16 else 17 x. parent = ALLOCATE-NODE() 18 DISK-WRITE (x) 19 root = x. parent 20 end-else 21 B-TREE-INSERT (x. parent, m, z) 22 end-if
Inserting a key into a B-tree SPLIT (x) 1 z = ALLOCATE-NODE() 2 m = FIND-MEDIAN (x) 3 COPY-GREATER-ELEMENTS(x, m, z) 4 DISK-WRITE (z) 5 COPY-SMALLER-ELEMENTS(x, m, x) 6 DISK-WRITE (x) 7 return [m, z]
Inserting a key into a B-tree • Function B-TREE-INSERT has three arguments: – The node x at which an element of key k should be inserted – The key k to be inserted – A pointer y to the left child of k to be used as one of the pointers of x during insertion process. • There is a global variable named root which is a pointer to the root of the B-Tree. • Observe that the field x. parent was not defined as an original B-tree attribute, but is considered just to simplify the process. • The fields x. leaf should also be updated accordingly.
Inserting a key into a B-tree Lemma: The time complexity of B-TREE-INSERT is O(B log B n) Proof: • Recall that B-TREE-SEARCH function is called first and costs O(log n) by using binary search. Then, B-TREE-INSERT starts by visiting a node and proceeds upward. • At most one node is visited per level/depth and only visited nodes can be splitted. A most one node is created during the insertion process. Cost for splitting is proportional to 2 B. • Number of visited nodes is equal to tree’s height and the height of a B-tree is O(log B n). Cost between B and 2 B iterations per visited node. Total of O(B log B n) steps. ■
Some questions on insertion • Which split operation increases the tree’s height? The split of the root of the tree. • How many DISK-READ operations are executed by the insertion algorithm? Every node was read at least twice. • Does binary search make sense here? Not exactly. We already pay O(B) to split a node (for finding the median).
Drawbacks of our insertion method • Once that the key’s insertion node is found it may be necessary to read its parent node again (due to splitting). • DISK-READ/WRITE operations are expensive and would be executed al least twice for each node in the key’s path. • It would be necessary to store a nodes’s parent or to use the recursion stack to keep its reference. • (Mond and Raz, 1985) provide a solution that spends one DISK-READ/WRITE per visited node (See at CLRS)
Exercise 3 • Show the results of inserting the keys E, H, B, A, F, G, C, J, D, I in order into an empty B-tree of order 1.
Exercise 3 Solution: (final configuration) E B A G I C D F H J
Exercise 4 • Does a B-tree of order 1 is a good choice for a balanced search tree? • What about the expression h ≤ log B (n+1)/2 when B = 1?
Deleting a key from a B-tree • Analogous to insertion but a little more complicated. • A key can be deleted from any node (not just a leaf) and can affect its parent and its children (insertion operation just affect parents). • One must ensure that a node does not get to small during deletion (less than B keys). • As a result deleting a node is the most complex operation on B-trees. It will be considered in 4 particular cases.
Deleting a key from a B-tree • Case 1: The key is in a leaf node with more than B elements. Procedure: § Just remove the key from the node.
Deleting a key from a B-tree • Case 1: The key is in a leaf node with more than B elements (B = 2) N E J A C D F G T X K M O Q U W Y Z
Deleting a key from a B-tree • Case 1: The key is in a leaf node with more than B elements (B = 2) N E J A D F G T X K M O Q U W Y Z
Deleting a key from a B-tree Case 2: The join procedure • The key k 1 to be deleted is in a leaf x with exactly B elements. • Let y be a node that is an “adjacent brother” of x. • Suppose that y has exactly B elements. Procedure: § Remove the key k 1. § Let k 2 be the key that separates nodes x and y in their parent. § Join the nodes x and y and move the key k 2 from the parent to the new joined node. § If the parent of x becomes with B-1 elements and also has an “adjacent brother” with B elements, apply the join procedure recursively for the parent of x (seen as x) and its adjacent brother (seen as y).
Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F K. . . H I O Q T X U W Y Z
Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F Parent K. . . H I O Q Node x T X U W Node y Y Z
Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F Parent K. . . H I T O Node x X U W Node y Y Z
Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F Parent K. . . H I T O Node x X U W Node y Y Z
Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F Parent K. . . H I X O T U W Join Y Z
Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F K. . . H I X O T U W Y Z
Deleting a key from a B-tree Case 3: join and split • The key k 1 to be deleted is in a leaf x with exactly B elements. • Let y be a node that is an “adjacent brother” of x. • Suppose that y has more than B elements. Procedure: § Remove the key k 1. § Let k 2 be the key that separates nodes x and y in their parent. § Join the nodes x and y and move the key k 2 from the parent to the new joined node z. § Find the median key m of z § Determine the new nodes x and y by splitting z around m. § Insert m into the parent of x and y.
Deleting a key from a B-tree • Case 3: Delete key F (B = 2) N E J A C D F G T X K M O Q U W Y Z
Deleting a key from a B-tree • Case 3: Delete key F (B = 2) N E J A C D F G T X K M O Q U W Y Z
Deleting a key from a B-tree • Case 3: Delete key F (B = 2) N E J A C D G T X K M O Q U W Y Z
Deleting a key from a B-tree • Case 3: Delete key F (B = 2) N Parent E J A C D Node y G Node x T X K M O Q U W Y Z
Deleting a key from a B-tree • Case 3: Delete key F (B = 2) N Parent E J A C D Node y G Node x T X K M O Q U W Y Z
Deleting a key from a B-tree • Case 3: Delete key F (B = 2) N J T X Join A C D E G K M O Q U W Y Z
Deleting a key from a B-tree • Case 3: Delete key F (B = 2) N J T X Median key A C D E G K M O Q U W Y Z
Deleting a key from a B-tree • Case 3: Delete key F (B = 2) N D J T X Split A C E G K M O Q U W Y Z
Deleting a key from a B-tree • Case 3: Delete key F (B = 2) N D J A C E G T X K M O Q U W Y Z
Deleting a key from a B-tree Case 4: internal node • The key k 1 to be deleted is in a node x that is not a leaf or a root. Procedure: § Let k 2 be the smallest key that is greater than k 1. § Let y be the node of k 2, which will be a leaf. § Insert key k 2 into x. § Remove the key k 1 from x. § Solve now the problem of removing k 2 from a leaf y, previously considered.
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N D J A C E G T X K M O Q U W Y Z
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G T X K M O Q U W Y Z
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G T X K M O Q U W Node y Y Z
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G U X K M O Q W Node y Y Z
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G U X K M O Q W Node y Y Z
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G U X K M O Q W Node y Y Z
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G U X K M O Q W Node y Y Z
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G U K M O Q W X Y Z Node y
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G U K M O Q W X Y Z Node y
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G U K M O Q W X Y Z Node y
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G U K M O Q W X Y Z Node y
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) N Node x D J A C E G U K M O Q W X Y Z Node y
Deleting a key from a B-tree • Case 4: Delete key T (B = 2) D A C E G J K M N U O Q W X Y Z
- Slides: 76