Dynamic Hashing Good for database that grows and
Dynamic Hashing • Good for database that grows and shrinks in size • Allows the hash function to be modified dynamically – When the hash function takes modulo 10 in the previous example, the number of buckets is fixed to 10 – The trick is how to change the hash function so that the number of buckets can change – And at the same time, without the need of rehashing the existing records! – Imagine if you change modulo 10 to modulo 13, then every existing record has to be rehashed – not a good idea Department of Computer Science and Engineering, HKUST Slide 1
Extendable Hashing • Extendable hashing - one form of dynamic hashing – Hashing function generates values over a large range - typically b-bit integers, with b = 32. b-1 0 i bits • At any time, use only a prefix of the b-bit integers to index into a table of bucket addresses. Let the length of the prefix be i bits, 0 < i < 32 • Initially i = 1, meaning that it can index at most 2 buckets • When the 2 buckets are full, we can use 2 bits (i = 2), meaning that we can now index at most 4 buckets, and so on and so forth…. • i grows and shrinks as the size of the database grows and shrinks. • Actual number of buckets is < 2 i, which may change due to bucket merging and splitting Department of Computer Science and Engineering, HKUST Slide 2
Extendable Hash Structure General Ideas New record • Initially, i = 1, use 1 bit in the hash key, resulting in two entries in the hash address table • Suppose we start with only 1 or 2 records, we need only 1 bucket initially • Both entries in the hash address table point to the same bucket • i 0 = 0 means no bit had been used to separate records in the bucket (I. e. , records are all hashed into bucket 0 irrespective of the any bit setting in the hash key values) Department of Computer Science and Engineering, HKUST Slide 3
Extendable Hash Structure Bucket Expansion New record • Suppose bucket 0 is full and a new record arrives • Create a new bucket, rehash the three records (two existing ones and the new record) into buckets 0 and, according to the last bit of their hash keys Department of Computer Science and Engineering, HKUST Slide 4
General Extendable Hash Structure New record Record originally in bucket 0 • Note: why do we need to keep i, i 0 and i 1? • i is the maximum number of bits used in hashing so far; i 0 and i 1 are the number of bits used for these particular buckets Department of Computer Science and Engineering, HKUST Slide 5
General Extendable Hash Structure A new record 1) Upon inserting of a new (red) record, bucket 0 is full again 2) Bucket 2 is created, and the three records (two existing ones and the new one) are rehashed among buckets 0 and 2 based on the second bit Department of Computer Science and Engineering, HKUST Slide 6
General Extendable Hash structure use the first 2 bits from the hash key to address the 4 entries in the table. 2 bits from the hash key had been use to hash the records use the 3 rd bit in next split Bucket 2 is new Bucket 1 not changed 1 bit from the hash key had been use to hash the records use the 2 nd bit in next split Department of Computer Science and Engineering, HKUST Slide 7
Extendable Hash Structure – Properties • Every expansion doubles the number of entries in the table • Multiple entries in the bucket address table may point to the same bucket. It means that the bucket hasn’t been expanded while other buckets had been expanded multiple times • Each bucket j stores a value ij; entries in the same bucket have the same values on the first ij bits of the hash keys • To locate the bucket containing search-key Kj: 1. Compute h(Kj) = X 2. Use the first i high order bits of X to look up the hash address table, and follow the pointer to appropriate bucket • To insert a record with search-key value Kj, look up the bucket where it should belong, say j. If there is room in bucket j insert record in the bucket, else the bucket must be split and insertion re-attempted. Department of Computer Science and Engineering, HKUST Slide 8
Split in Extendable hash Structure To split a bucket j when inserting record with search-key value Kj; • If i > ij (more than one pointer to bucket j) – allocate a new bucket z, and set ij and iz to the old ij+1. – make the second half of the bucket address table entries pointing to j to point to z – remove and reinsert each record in bucket j. – recompute new bucket for Kj and insert record in the bucket (further splitting is required if the bucket is still full) could be any number > 1 1 2 2 2 Department of Computer Science and Engineering, HKUST Slide 9
Split in Extendable hash Structure To split a bucket j when inserting record with search-key value Kj; • If i = ij (only one pointer to bucket j) – increment i and double the size of the bucket address table. – Replace each entry in the table by two entries that point to the same bucket. – Re-compute new bucket address table entry for Kj, now i > ij, so use the first case above. i 0=2 i 1=2 new record i 2=2 i 3=2 Department of Computer Science and Engineering, HKUST Slide 10
Example: Use of Extendable Hash Structure Branch-name Brighton Downtown Mianus Perryridge Redwood Round hill h(branch-name) 0010 1101 1111 1011 0010 1100 0011 0000 1010 0011 1010 0000 1100 0110 1001 1111 1100 0111 1110 1101 1011 1111 0011 1010 1111 0001 0010 0100 1001 0011 0110 1101 0011 0101 1010 0110 1100 1001 1110 1011 1101 1000 0011 1111 1001 1100 0001 Initial Hash structure, Bucket size=2 0 Bucket 0 hash address table Department of Computer Science and Engineering, HKUST Slide 11
Example Insert: Brighton, A-217, 750 0010 1101 1111 1011 0010 1100 0011 0000 no bit is needed from the hash value (i=0) 0 Brighton, A-217, 750 Department of Computer Science and Engineering, HKUST Slide 12
Example Insert: Downtown, A-101, 500 1010 0011 1010 0000 1100 0110 1001 1111 no bit is needed from the hash value (i=0) 0 Brighton, A-217, 750 Downtown, A-101, 500 Insert: Downtown, A-101, 600 bucket full, split records according to first bit (i=1) Department of Computer Science and Engineering, HKUST Slide 13
Example Brighton Downtown 0010 1101 1111 1011 0010 1100 0011 0000 1010 0011 1010 0000 1100 0110 1001 1111 1 Brighton, A-217, 750 1 Downtown, A-101, 500 Downtown, A-101, 600 Insert: Mianus, A-215, 700 1100 0111 1110 1101 1011 1111 0011 1010 Hash into bucket 1, which is full Department of Computer Science and Engineering, HKUST Slide 14
Example Mianus Downtown 1100 0111 1110 1101 1011 1111 0011 1010 0000 1100 0110 1001 1111 1 1 bit had been used to allocate records Brighton, A-217, 750 2 bits had been used to allocate records 2 Downtown, A-101, 500 Downtown, A-101, 600 2 Note: directory size is doubled Mianus, A-215, 700 Department of Computer Science and Engineering, HKUST Slide 15
Example 1 Brighton, A-217, 750 2 Downtown, A-101, 500 Downtown, A-101, 600 2 Mianus, A-215, 700 Perryridge, A-102, 400 Insert: Perryridge, A-102, 400 1111 0001 0010 0100 1001 0011 0110 1101 Insert: Perryridge, A-201, 900 Bucket 2 overflows again! 1111 0001 0010 0100 1001 0011 0110 1101 Department of Computer Science and Engineering, HKUST Slide 16
Example 1 Brighton, A-217, 750 0 0 1 01 1 10 1 11 2 Downtown, A-101, 500 Downtown, A-101, 600 32 Mianus, A-215, 700 Perryridge, A-102, 400 3 • Expand hash address table, renumber the entries by adding one more bit to the left • Bucket 10 becomes bucket 10 x, where x could be 0 or 1 • Bucket 11 becomes bucket 110 and 111, because it is split; new bucket is added; local level is updated Perryridge, A-201, 900 Redistribute overflow and new record Department of Computer Science and Engineering, HKUST Slide 17
Example 1 Brighton, A-217, 750 2 Downtown, A-101, 500 Downtown, A-101, 600 3 Mianus, A-215, 700 3 Perryridge, A-201, 900 Perryridge, A-102, 400 Perryridge, A-218, 700 Insert: Perryridge, A-218, 700 Bucket 3 overflows again! 1111 0001 0010 0100 1001 0011 0110 1101 Department of Computer Science and Engineering, HKUST Slide 18
Example 1 Brighton, A-217, 750 Redwood, A-222, 700 2 Downtown, A-101, 500 Downtown, A-101, 600 3 Mianus, A-215, 700 Round Hill, A-305, 350 3 Perryridge, A-201, 900 Perryridge, A-102, 400 Perryridge, A-218, 700 Done! Department of Computer Science and Engineering, HKUST Slide 19
Updates in Extendable Hash Structure • When inserting a value, if the bucket is full after several splits (that is, i reaches some limits b) create an overflow bucket instead of splitting bucket entry table further. • To delete a key value, locate it in its bucket and remove it. The bucket itself can be removed if it because empty (with appropriate updates to the bucket address table). Coalescing of buckets and decreasing bucket address table size is also possible. Department of Computer Science and Engineering, HKUST Slide 20
Extendible Hashing is not Pure Hashing • Pure hashing maps a key value directly to the bucket where the record containing the key value can be found. • Extendible hashing maps a key value to the entry in the hash prefix table which contains a pointer to the bucket where the record containing the key value can be found. • The hash-prefix table can be considered as a complete binary tree; that is, extendible hashing is a combination of tree and hashing! 00 01 10 11 Department of Computer Science and Engineering, HKUST Slide 21
- Slides: 21