BTrees CS 583 Analysis of Algorithms 12292021 CS
B-Trees CS 583 Analysis of Algorithms 12/29/2021 CS 583 Fall'06: B-Trees 1
Outline • Data Structures on Secondary Storage – Magnetic disks – Efficient operations • B-Trees – Definitions – Searching – Inserting • Self-test – 18. 1 -1, 18. 1 -2, 18. 2 -1, 18. 2 -2 12/29/2021 CS 583 Fall'06: B-Trees 2
Magnetic Disks • The main memory of a computer system consists of silicon memory chips. – It is typically two orders of magnitude more expensive than the magnetic storage technology. • Magnetic disks are cheaper and have higher capacity than main memory. – However, they are much slower because of moving parts. – In order, to amortize time spent for mechanical movements, disks access several items at the same time. • Information is divided into equal size pages. • Pages appear as consecutive bits within cylinders. • Once the read/write head is positioned at the desired page, large amounts of data can be accessed quickly. 12/29/2021 CS 583 Fall'06: B-Trees 3
Disk Operations When x is an object that resides on a disk the following pseudocode conventions are used: x = <a pointer to some object> Disk-Read(x) <access and modify fields of x> Disk-Write(x) In most systems the running time of a B-Tree algorithm is determined by the number of disk read and write operations. Hence, a B-tree node is usually as large as a disk page. Example: a B-tree with a branching factor of 1001 and height 2 can store a Billion+ keys. Since the root note is stored in main memory, only two disk accesses at most are needed to find any key! 12/29/2021 CS 583 Fall'06: B-Trees 4
B-tree Definition • We assume that any satellite information associated with a key is stored in the same node as a key. • A B-tree is a rooted tree with the following properties: – Every node x has the following fields: • n[x], the number of keys stored in x. • n[x] keys stored in non-decreasing order: key 1[x] <= key 2[x] <=. . . <= keyn[x][x] • leaf[x] = true if x is a leaf, and false otherwise. – Each internal node x contains n[x]+1 pointers to its children: c 1[x], c 2[x], . . . , cn[x]+1[x] 12/29/2021 CS 583 Fall'06: B-Trees 5
B-tree Definition (cont. ) • Properties (cont. ): – The keys keyi[x] separate the ranges stored in each subtree: if ki is any key stored in the subtree with root ci[x], then • k 1 <= key 1[x] <= k 2 <= key 2[x] <=. . . <= keyn[x][x] <= kn[x]+1 – All leaves have the same depth, -- the tree’s height h. – There are lower and upper bounds on the number of keys in a node. They are expressed in terms of an integer t >= 2 called the minimum degree: • Every node other than the root must have at least t-1 keys. • Every node can contain at most (2 t-1) keys. We say the node is full if it contains exactly (2 t-1) keys. 12/29/2021 CS 583 Fall'06: B-Trees 6
Height of the Tree The number of disk accesses for a B-tree is proportional to the height of the tree. Theorem 18. 1 If n >= 1, then for any n-key B-tree T of height h and minimum degree t >= 2: h <= logt (n+1)/2 Proof. If a B-tree has height h, the root contains at least one key and all other nodes contain at least (t-1) keys. Thus there at least 2 nodes at depth 1, at least 2 t nodes at depth 2, and so on, until 2 th-1 nodes at depth h. 12/29/2021 CS 583 Fall'06: B-Trees 7
Height of the Tree (cont. ) The number of n keys satisfies inequality: n >= 1 + (t-1) i=1, h 2 ti-1 = 1+2(t-1)(th-1)/(t-1) = 2 th-1 => th <= (n+1)/2 => h <= logt(n+1)/2 Hence the height of the B-tree grows as O(logt n) , which is significantly slower than the growth of the height of the red-black tree, -- O(lg n). This means that the number of disk accesses is substantially reduced for most tree operations. 12/29/2021 CS 583 Fall'06: B-Trees 8
Basic Operations • The root of the B-tree is always in main memory. – Disk-Read on the root is never required. – Disk-Write is required when the root node is changed. • Any nodes that are passed as parameters have already had Disk-Read performed on them. • All basic procedures are “one-pass” algorithms: – They proceed downward from the root of the tree, without having to back up. 12/29/2021 CS 583 Fall'06: B-Trees 9
Searching The searching algorithm takes as input a pointer to the root node x of a subtree, and a key k. It returns a pair (y, i) such that keyi[y] = k. B-Tree-Search(x, k) 1 i = 1 2 while i <= n[x] and k > key_i[x] 3 i++ 4 if i <= n[x] and k = key_i[x] 5 return (x, i) 6 if leaf[x] 7 return NIL 8 else 9 Disk-Read (c_i[x]) // read ith child of x 10 return B-Tree-Search(c_i[x], k) 12/29/2021 CS 583 Fall'06: B-Trees 10
Searching: Performance • The nodes encountered during the recursion form a path downward from the root of the tree. • The number of disk pages accessed by B-Tree. Search is O(h) = O(logt n). • For each node, n[x] < 2 t, hence the while loop 2 -3 takes O(t) time. • Therefore the total CPU time is O(th) = O(logt n). 12/29/2021 CS 583 Fall'06: B-Trees 11
Inserting • General algorithm: – Search for the leaf node y at which to insert the new key. • If the node y is full (having 2 t-1 keys): – Split the full node around its median key: keyt[y]: • Create two nodes with (t-1) keys each. • Move the median key up to y’s parent. • If y’s parent is also full, make the split again. • The key is inserted in a single path down the tree. – Each full node is split along the way. – This assures that when the y node needs to be split, its parent cannot be full. 12/29/2021 CS 583 Fall'06: B-Trees 12
Splitting a Node • The procedure B-Tree-Split-Child takes as input non -full node x, index i, and a full child y of x: y=ci[x]. • The procedure then splits y in two and adjusts x so that it has an additional child. • When the root needs to be split, a new root needs to be created. – The tree grows in height by one. – Splitting is the only means to grow the tree. 12/29/2021 CS 583 Fall'06: B-Trees 13
Splitting Node: Pseudocode B-Tree-Split-Child(x, i, y) 1 z = Allocate-Node() // allocate a disk page 2 leaf[z] = leaf[y] 3 n[z] = t-1 4 for j = 1 to t-1 5 keyj[z] = keyj+t[y] 6 if not leaf[y] 7 for j = 1 to t 8 cj[z] = cj+t[y] 9 n[y] = t-1 // shift children to the right 10 for j = n[x] downto i+1 11 cj+1[x] = cj[x] 12 ci+1[x] = z // add z as a new child 12/29/2021 CS 583 Fall'06: B-Trees 14
Splitting Node: Pseudocode (cont. ) 13 14 15 16 17 18 19 // make room for the median for j = n[x] downto i keyj+1[x] = keyj[x] keyi[x] = keyt[y] n[x]++ Disk-Write(y) Disk-Write(z) Disk-Write(x) The CPU time is determined by loops 4 -5 and 7 -8, which is (t). Note that other loops perform O(t) iterations. The procedure performs (1) disk operations. 12/29/2021 CS 583 Fall'06: B-Trees 15
Inserting a Key: Algorithm B-Tree-Insert(T, k) 1 r = root[T] 2 if n[r] = 2 t-1 // full node 3 s = Allocate-Node() 4 root[T] = s 5 leaf[s] = FALSE 6 n[s] = 0 7 c 1[s] = r // split the old root 8 B-Tree-Split-Child(s, 1, r) 9 B-Tree-Insert-Nonfull(s, k) 10 else 11 B-Tree-Insert-Nonfull(r, k) 12/29/2021 CS 583 Fall'06: B-Trees 16
Inserting a Key: Algorithm (cont. ) // Insert key k into a non-full node x B-Tree-Insert-Nonfull(x, k) 1 i = n[x] 2 if leaf[x] // k is inserted in the ordered list 3 while i >= 1 and k < keyi[x] 4 keyi+1[x] = keyi[x] 5 i-6 keyi+1[x] = k 7 n[x]++ 8 Disk-Write(x) 9 else // search the leaf to insert into 12/29/2021 CS 583 Fall'06: B-Trees 17
Inserting a Key: Algorithm (cont. ) 10 11 12 13 14 15 16 17 18 12/29/2021 while i >= 1 and k < keyi[x] i-i++ Disk-Read(ci[x]) if n[ci[x]] = 2 t-1 // full node B-Tree-Split-Child(x, i, ci[x]) if k > keyi[x] i++ B-Tree-Insert-Nonfull(ci[x], k) CS 583 Fall'06: B-Trees 18
Inserting a Key: Performance • The number of disk accesses performed by B-Tree. Insert is O(h) for a B-tree of height h. – Only a O(1) of Disk-Read and Disk-Write operations are performed at each level in the B-Tree-Insert-Nonfull. • The total CPU time is O(t h) = O(logt n) – At each level of the tree the number of CPU operations are determined by while loops in B-Tree-Insert-Nonfull. • The maximum number of iterations in these loops are 2 t-1, hence the total time at each level is O(t). 12/29/2021 CS 583 Fall'06: B-Trees 19
- Slides: 19