NOTES ON DYNAMIC HASHING Dynamic hashing techniques allow

  • Slides: 7
Download presentation
NOTES ON DYNAMIC HASHING Dynamic hashing techniques allow the hash structure to be modified

NOTES ON DYNAMIC HASHING Dynamic hashing techniques allow the hash structure to be modified dynamically to accommodate the growth or shrinkage of the database. One such technique is extendable hashing. i (Hash Prefix) 00 R(K) Bucket 1 (i 1) 01 Hash Function 10 11 Bucket Address Table Bucket 2 (i 2) Bucket 3 (i 3) Figure 1 – Architecture of Extendable Hashing

The architecture of extendable hashing consists of: 1. A hash function 2. A Bucket

The architecture of extendable hashing consists of: 1. A hash function 2. A Bucket Address Table, which is a variable-size index table 3. A variable number of buckets in the prime data area In extendable hashing, we choose a hash function that generates values over a relatively large range, namely b-bit binary integers. Typically, b = 32. We do not create 232 ( ~ 4 billion) buckets, one for each hash value. We create buckets on demand, as records are inserted into the file. We do not use the entire b bits of the hash initially. At any point, we use i bits, where 0 <= i <= b. Here, i is called hash prefix. Each bucket, as well as the bucket address table (BAT) has an assigned hash prefix.

Example of Extendable Hashing Assume that bucket capacity = 2 records. We have five

Example of Extendable Hashing Assume that bucket capacity = 2 records. We have five records, inserted into the hashed file in the order shown in the table below. The key values of the record hash to the shown bit strings. Loading Sequence R 1 R 2 R 3 R 4 R 5 Key K 1 K 2 K 3 K 4 K 5 BAT (i = 0) Hash Value 11110001…. 01011000…. 10100011…. 10110101…. 10000111…. Bucket 1 (i 1 = 0) Empty Initial State: There is only one empty bucket, and one entry in the BAT containing a pointer to the empty bucket.

BAT (i = 0) Bucket 1 (i 1 = 0) R 1 (1111…. )

BAT (i = 0) Bucket 1 (i 1 = 0) R 1 (1111…. ) R 2 (0101…. ) Steps 1 and 2: Insert R 1 and then R 2. BAT (i = 1) Bucket 1 (i 1 = 1) R 2 (0101…) 0 1 Bucket 2 (i 2 = 1) R 1 (1111…) R 3 (1010…) Step 3: Insert R 3 (1010…) In Step 3, because the only available bucket is full, the bucket is “split”. That is, a new bucket is acquired; the number of bits that is used from hash is incremented by 1; hash prefix of the BAT is incremented by 1; the size of the BAT is doubled; the records in Bucket 1 and the new record are distributed between Bucket 1 and the new bucket, Bucket 2; and the pointers in the BAT are appropriately adjusted.

BAT (i = 2) Bucket 1 (i 1 = 1) R 2 (0101…) 00

BAT (i = 2) Bucket 1 (i 1 = 1) R 2 (0101…) 00 01 10 11 Bucket 2 (i 2 = 2) R 3 (1010…) R 4 (1011…) Bucket 3 (i 3 = 2) R 1 (1111…) Step 4: Insert R 4 (1011…) This is another example of bucket splitting.

BAT (i = 3) Bucket 1 (i 1 = 1) R 2 (0101…) 000

BAT (i = 3) Bucket 1 (i 1 = 1) R 2 (0101…) 000 001 010 011 Bucket 4 (i 4 = 3) R 5 (1000…) 100 101 110 111 Bucket 2 (i 2 = 3) R 3 (1010…) R 4 (1011…) Bucket 3 (i 3 = 2) R 1 (1111…) Step 5: Insert R 5 (1000…) Yet another example of bucket splitting.

BAT (i = 2) Bucket 1 (i 1 = 1) R 2 (0101…) 00

BAT (i = 2) Bucket 1 (i 1 = 1) R 2 (0101…) 00 01 10 11 Bucket 2 (i 2 = 2) R 3 (1010…) R 5 (1000…) Bucket 3 (i 3 = 2) R 1 (1111…) Step 6: Delete R 4 (1011…) In Step 6, when we delete R 4 from Bucket 3, what is left in Bucket 3 (R 3) can be combined with the content of Bucket 4 (R 5). Therefore, we combine R 3 and R 5 into Bucket 2; return Bucket 4 to the operating system; reduce the hash prefix of the BAT and Bucket 2 by 1; reduce the size of the BAT by one half; and adjust the pointers in the BAT.