CPSC 335 Dr Marina Gavrilova Computer Science University

  • Slides: 14
Download presentation
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada

CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada

OUTLINE n n Extendible hashing Expandable and dynamic hashing Virtual hashing Summary 2

OUTLINE n n Extendible hashing Expandable and dynamic hashing Virtual hashing Summary 2

Hash Functions for Extendible Hashing Ø Standard hashing works on fixed file size. Ø

Hash Functions for Extendible Hashing Ø Standard hashing works on fixed file size. Ø What if we add / delete many keys? What if the file sizes change significantly? Ø Then we will develop separate techniques. Two types: - Directory schemes - Directory less schemes 3

Extendible Hashing Ø Keys stored in buckets. Ø Each bucket can only hold a

Extendible Hashing Ø Keys stored in buckets. Ø Each bucket can only hold a fixed size of items. Ø Index is an extendible table; h(x) hashes a key value x to a bit map; only a portion of a bit map is used to build a directory. Example: buckets h(kn) = 11011 00 01 10 11 Table 00110 00101 01100 01011 Add kn b 00 b 01 10011 11110 11111 b 1 **************** 00 01 10 11 b 00 b 01 10011 b 10 11011 11110 11111 b 11 4

Hash Functions for Extendible Hashing Ø Directory schemes - Extendible Hashing (Fagin et. al.

Hash Functions for Extendible Hashing Ø Directory schemes - Extendible Hashing (Fagin et. al. 1979) - Expandable hashing (Knott 1971) - Dynamic Hashing (Larson 1978) Ø Directory less schemes - Virtual hashing (Litwin 1978) 5

Extendible Hashing Ø Size of a bucket = MAX # of pseudokeys (3 in

Extendible Hashing Ø Size of a bucket = MAX # of pseudokeys (3 in our example) 000 Ø Once the bucket is full – split the bucket into two Two situation will be possible: - Directory remains of the same size adjust pointer to a bucket 001 010 011 100 101 110 - Size of directory grows from 2 k to 2 k+1 i. e. directory size can be 1, 2, 4, 8, 16 etc (8 is shown in the figure). 111 The number of buckets will remain the same, i. e. some references will point to the same bucket. Finally, one can use bitmap to build the index but store an actual key in 6 the bucket!

Extendible Hashing 1. Use as much space as needed. 2. Input the file name,

Extendible Hashing 1. Use as much space as needed. 2. Input the file name, # of words to insert Use bucket size: 128 3. Use any function h(k) that returns the string of bits of up to 32 bits (integer type can be used). 4. Bucket – char array 5. Main idea: only the FIRST bits of the mask are used for search 7

Extendible Hashing Assume that a hashing technique is applied to a dynamically changing file

Extendible Hashing Assume that a hashing technique is applied to a dynamically changing file composed of buckets, and each bucket can hold only a fixed number of items. Extendible hashing accesses the data stored in buckets indirectly through an index that is dynamically adjusted to reflect changes in the file. The characteristic feature of extendible hashing is the organization of the index, which is an expandable table. 8

Extendible Hashing Ø A hash function applied to a certain key indicates a position

Extendible Hashing Ø A hash function applied to a certain key indicates a position in the index and not in the file (or table or keys). Values returned by such a hash function are called pseudokeys. Ø The file requires no reorganization when data are added to or deleted from it, since these changes are indicated in the index. Only one hash function h can be used, but depending on the size of the index, only a portion of the added h(K) is utilized. Ø A simple way to achieve this effect is by looking at the address into the string of bits from which only the i leftmost bits can be used. The number i is the depth of the directory. In figure 1(a) (in the next slide), the depth is equal to two. 9

Extendible Hashing Figure 1. An example of extendible hashing (Drozdek Textbook) 10

Extendible Hashing Figure 1. An example of extendible hashing (Drozdek Textbook) 10

Expandable & Dynamic Hashing Expandable Hashing Ø Similar idea to an extendible hashing. But

Expandable & Dynamic Hashing Expandable Hashing Ø Similar idea to an extendible hashing. But binary tree is used to store an index on the buckets. Dynamic Hashing Ø multiple binary trees are used. Outcome: - To shorten the search. - Based on the key --- select what tree to search. 11

Dynamic Hashing Ø Larson method Ø Index is simplified to be represented as a

Dynamic Hashing Ø Larson method Ø Index is simplified to be represented as a set of binary trees. Ø Height of each tree is limited. Ø h(x) is searched in ALL trees. ØTime: m – trees, k keys in each max, overall: m*lgk. ØAdvantage: shorter search time in index file 12

Virtual Hashing Litwin’s Virtual Hashing Ø Expand buckets in a linear fashion. Ø Store

Virtual Hashing Litwin’s Virtual Hashing Ø Expand buckets in a linear fashion. Ø Store them continuously in the memory. Ø No table is needed, the procedure is simple. 13

Summary n n n n Extendible hashing advantages: Initially allocated space can increase indefinitely

Summary n n n n Extendible hashing advantages: Initially allocated space can increase indefinitely Location of a bucket where key belongs requires only very fast bits comparison Very flexible in choosing size of the bucket, and allows their storage on disks/remote memory access Extendible hashing disadvantages: Increased algorithm complexity Extra memory overhead to store index inside the bucket 14