NOTES ON DYNAMIC HASHING Dynamic hashing techniques allow







- Slides: 7

NOTES ON DYNAMIC HASHING Dynamic hashing techniques allow the hash structure to be modified dynamically to accommodate the growth or shrinkage of the database. One such technique is extendable hashing. i (Hash Prefix) 00 R(K) Bucket 1 (i 1) 01 Hash Function 10 11 Bucket Address Table Bucket 2 (i 2) Bucket 3 (i 3) Figure 1 – Architecture of Extendable Hashing

The architecture of extendable hashing consists of: 1. A hash function 2. A Bucket Address Table, which is a variable-size index table 3. A variable number of buckets in the prime data area In extendable hashing, we choose a hash function that generates values over a relatively large range, namely b-bit binary integers. Typically, b = 32. We do not create 232 ( ~ 4 billion) buckets, one for each hash value. We create buckets on demand, as records are inserted into the file. We do not use the entire b bits of the hash initially. At any point, we use i bits, where 0 <= i <= b. Here, i is called hash prefix. Each bucket, as well as the bucket address table (BAT) has an assigned hash prefix.

Example of Extendable Hashing Assume that bucket capacity = 2 records. We have five records, inserted into the hashed file in the order shown in the table below. The key values of the record hash to the shown bit strings. Loading Sequence R 1 R 2 R 3 R 4 R 5 Key K 1 K 2 K 3 K 4 K 5 BAT (i = 0) Hash Value 11110001…. 01011000…. 10100011…. 10110101…. 10000111…. Bucket 1 (i 1 = 0) Empty Initial State: There is only one empty bucket, and one entry in the BAT containing a pointer to the empty bucket.

BAT (i = 0) Bucket 1 (i 1 = 0) R 1 (1111…. ) R 2 (0101…. ) Steps 1 and 2: Insert R 1 and then R 2. BAT (i = 1) Bucket 1 (i 1 = 1) R 2 (0101…) 0 1 Bucket 2 (i 2 = 1) R 1 (1111…) R 3 (1010…) Step 3: Insert R 3 (1010…) In Step 3, because the only available bucket is full, the bucket is “split”. That is, a new bucket is acquired; the number of bits that is used from hash is incremented by 1; hash prefix of the BAT is incremented by 1; the size of the BAT is doubled; the records in Bucket 1 and the new record are distributed between Bucket 1 and the new bucket, Bucket 2; and the pointers in the BAT are appropriately adjusted.

BAT (i = 2) Bucket 1 (i 1 = 1) R 2 (0101…) 00 01 10 11 Bucket 2 (i 2 = 2) R 3 (1010…) R 4 (1011…) Bucket 3 (i 3 = 2) R 1 (1111…) Step 4: Insert R 4 (1011…) This is another example of bucket splitting.

BAT (i = 3) Bucket 1 (i 1 = 1) R 2 (0101…) 000 001 010 011 Bucket 4 (i 4 = 3) R 5 (1000…) 100 101 110 111 Bucket 2 (i 2 = 3) R 3 (1010…) R 4 (1011…) Bucket 3 (i 3 = 2) R 1 (1111…) Step 5: Insert R 5 (1000…) Yet another example of bucket splitting.

BAT (i = 2) Bucket 1 (i 1 = 1) R 2 (0101…) 00 01 10 11 Bucket 2 (i 2 = 2) R 3 (1010…) R 5 (1000…) Bucket 3 (i 3 = 2) R 1 (1111…) Step 6: Delete R 4 (1011…) In Step 6, when we delete R 4 from Bucket 3, what is left in Bucket 3 (R 3) can be combined with the content of Bucket 4 (R 5). Therefore, we combine R 3 and R 5 into Bucket 2; return Bucket 4 to the operating system; reduce the hash prefix of the BAT and Bucket 2 by 1; reduce the size of the BAT by one half; and adjust the pointers in the BAT.