BTREE Michael Tsai 20170606 2 BTree Overview Balanced
B-TREE Michael Tsai 2017/06/06
2 B-Tree Overview • Balanced search tree • Very large “branching factor” • Height = O(log n), but much less than that of RB tree • Usage: Large amount of data to be stored -Partially in memory, and partially in secondary storage (e. g. , hard drive) • Goal: 1. Minimizing disk I/O operation 2. Minimizing CPU time
3 Typical Storage Speed / Capacity Storage Read speed Capacity Hard Drive Typically ~100 MB/s Up to 10 TB SSD ~500 MB/s Up to 1 TB SD 30 MB/s (UHS-3) 10 MB/s (class 10) Memory 6400 MB/s (DDR 3) Up to 1 TB (Typically 32 GB or 64 GB) Desktop/laptop has 4~16 GB
4 Typical B-Tree (keys) Internal node x has x. n keys (3) Keys in x separate the ranges of keys in its sub trees. Internal node x has x. n+1 children (4)
5 Typical B-Tree (search) R Internal node x has x. n keys (3) Keys in x separate the ranges of keys in its sub trees. Internal node x has x. n+1 children (4)
6 A more realistic B-tree Usually only a node’s keys/data is read from the disk at a time. Root is always kept in the memory.
7 B-Tree Definition (1) • B-Tree is a rooted tree • For each node x: • x. n is the number of keys in x • Keys: are stored in non-decreasing order. • x. leaf, TRUE if x is a leaf and FALSE otherwise. • Each internal node x contains x. n+1 pointers Leaves have these undefined.
8 B-Tree Definitions (2) • The keys separate the ranges of keys stored in each subtree: is any key stored in the subtree with root , • All leaves have the same depth: the tree’s height h. • Minimum degree of B-tree: • Every node other than root have at least t-1 keys (thus t children) • Every node can have at most 2 t-1 keys (thus 2 t children) (In this case, this node is full)
9 Proof: B-Tree Height If , then for any n-key B-tree T of height h and minimum degree , • Proof: Consider the case with each node having the least # of
10 Proof: B-Tree Height
11 Disk Operation • DISK-READ(x) : if x is not in memory, then we require this before accessing x. “no-op” if x is already in the memory. • DISK-WRITE(x): this is required for putting any changes of x back to the disk. • Root is always stored in the memory • Typical work flow: x = pointer to an object DISK-READ(x) (operations to modify x) DISK-WRITE(x) Operations to access x (but no modifications)
12 Search in B-Tree Input: x: search from this node k: key to be searched Return value: (x, i): key k is found at node x’s i-th key CPU time: Disk I/O:
13 Create an empty B-Tree Allocate-Node() is a O(1) operation to allocate a disk page to store a new node CPU time: O(1) Disk I/O: O(1)
14 B-Tree Insertion: Overview • Cannot simply create a new leaf node and insert it: this will violate the B-tree definitions • Sol: insert into an existing leaf node • Problem: what if that leaf node is already FULL? • FULL: having 2 t-1 keys and 2 t children • Sol: split a full node y around its median key t keys smaller than median • Then move t keys larger than median up to y’s parent node. • What if y’s parent is also full? We split it too • Workflow: Start from root (search), split all traversed full nodes
15 B-Tree Insertion: Overview t=4 2 t-1=7
16 B-Tree: Split Full Child Split node x’s i-th child, which is full CPU time: O(t) Disk I/O: O(1)
17 B-Tree: Split the Root • Splitting the root is the only way to increase the height of a B- tree • Height is increased at the top, not at the bottom
18 B-Tree: Split the Root • Please review the pseudo code below yourself.
19 Insertion Example
20 B-Tree Insertion: pseudo code If x is a leaf, insert k at the right location If x is not a leaf, then. . Find the right child node If the child node is full, first split it! (its median key will come back to this node) Finally, recursive call to continue to the child node
21 B-Tree Insertion: running time CPU time: Disk I/O:
22 Reading Assignment (Real) • Chapter 18. 3 Deleting a key from a B-tree
- Slides: 22