Hashing Hashing is another method for sorting and

  • Slides: 27
Download presentation
Hashing • Hashing is another method for sorting and searching data. – Hashing makes

Hashing • Hashing is another method for sorting and searching data. – Hashing makes it easier to add and remove elements from a data structure. – The worst-case behavior for locating a key is linear – Q(n). – Java’s standard hash table class is: java. util. Hashtable

Hashing • Hashing usually implements a data structure called a hash table. – A

Hashing • Hashing usually implements a data structure called a hash table. – A hash table is an effective data structure. – A hash table is a generalization of an array. – A hash table requires a key to access data.

Hashing – A hash table uses an array whose length is proportional to the

Hashing – A hash table uses an array whose length is proportional to the number of keys actually stored. – The array index is computed from the key, rather than using the key to access the array. • The key is a unique identifying value.

Hashing Functions • Hashing requires the use of a hashing function. – The purpose

Hashing Functions • Hashing requires the use of a hashing function. – The purpose of the hashing function is to compute the storage slot from the key. • Maps key values to array indices. – This calculation reduces the range of array indices that need to be handled.

Hashing Functions – If a hashing function groups key values together, this is called

Hashing Functions – If a hashing function groups key values together, this is called clustering of the keys. • A good hashing function distributes the key values uniformly through the array’s index range. • Any hashing function that results in clustering should be changed. • A good hashing function has an equal likelihood of hashing a key into any of the slots. • The java. util. Hashtable contains the method hash. Code

Hashing Functions • The division hash function depends upon the remainder of division. –

Hashing Functions • The division hash function depends upon the remainder of division. – Math. abs(H(k)) % table. length – When using the division hash function, it is best to have a table size that is a prime number of the form 4 n + 3. – Using the division hash function can result in many collisions.

Hashing Functions • The mid-square hash function converts the key to an integer, then

Hashing Functions • The mid-square hash function converts the key to an integer, then doubles the key. The function returns the middle digits of the results. • The multiplicative hash function converts the key to an integer and multiplies it by a constant less than one. The function returns the first few digits of the fractional part of the result.

Example Table 0 Universe of Keys - U H(k 1) H(k 4) K 1

Example Table 0 Universe of Keys - U H(k 1) H(k 4) K 1 Actual K 4 Keys – K K 2 K 5 K 3 H(k 2) H(k 3) m-1

Collisions • A collision occurs when the hashing function calculates the same array index

Collisions • A collision occurs when the hashing function calculates the same array index for two different objects and one is already stored into the array index location. – Two keys hash to the same slot.

Collision Example Table 0 Universe of Keys - U H(k 1) H(k 4) K

Collision Example Table 0 Universe of Keys - U H(k 1) H(k 4) K 1 Actual K 4 Keys – K K 2 K 5 K 3 H(k 2) = H(k 5) H(k 3) m-1

Open Addressing • Open addressing ensures that all elements are stored directly into the

Open Addressing • Open addressing ensures that all elements are stored directly into the hash table. – Every table slot contains either data or null. – The problem is that the table can fill up. – The good thing is that there are no external storage locations for the table elements.

Open Addressing – Open addressing attempts to resolve collisions using various methods.

Open Addressing – Open addressing attempts to resolve collisions using various methods.

Linear Probing • Linear Probing resolves collisions by placing the data into the next

Linear Probing • Linear Probing resolves collisions by placing the data into the next open slot in the table. • If this slot is open, the data is stored in the slot. • If this slot is not open, the algorithm looks at the next slot (index) until an open slot is found.

Linear Probing – It is difficult to delete items from a hash table that

Linear Probing – It is difficult to delete items from a hash table that uses open addressing. • Can not simply put null into the slot because may miss information. Instead place Deleted into the empty slot. – If H’(k) is the ordinary hash function, the linear probing hash function is: • H(k, i) = (H’(k) + 1) % m where i = 0, 1, 2, … , m and m is the number of elements that can be stored into the table.

Linear Probing – A problem associated with Linear Probing is called, primary clustering. •

Linear Probing – A problem associated with Linear Probing is called, primary clustering. • Primary clustering occurs when many items hash into the same slot and long runs of slots are filled up. • This results in increased search times.

Linear Probing Table 0 Universe of Keys - U H(k 1) H(k 4) K

Linear Probing Table 0 Universe of Keys - U H(k 1) H(k 4) K 1 Actual K 4 Keys – K K 2 K 5 K 3 H(k 2) = H(k 5) H(k 3) m-1

Double Hashing • Double hashing is one of the best methods for dealing with

Double Hashing • Double hashing is one of the best methods for dealing with collisions. – The slot location is calculated based upon the hash function (H 1(k)). If the slot is full, then a second hash function is calculated and combined with the first hash function (H(k, i)) to determine a new slot.

Double Hashing – Assume that: • H 1(k) = Math. abs(H(k)) % table. length

Double Hashing – Assume that: • H 1(k) = Math. abs(H(k)) % table. length • H 2(k) = 1 + Math. abs(H(k)) % (table. length – x) where x is a small value; 1, 2, or 3. – Then: • H(k, i) = (H 1(k) + i H 2(k) ) % m

Double Hashing Table 0 H(k 5) Universe of Keys - U H(k 1) H(k

Double Hashing Table 0 H(k 5) Universe of Keys - U H(k 1) H(k 4) K 1 Actual K 4 Keys – K K 2 K 5 K 3 H(k 2) = H(k 5) H(k 3) m-1

External Chaining • In external chaining the hash table contains an array in which

External Chaining • In external chaining the hash table contains an array in which each component can hold more than one element of the hash table. – Essentially, a multiple dimension array or a linked list of elements can exist for each table slot. • The typical implementation is that each slot contains a linked list.

External Chaining Table 0 Universe of Keys - U H(k 1) H(k 4) K

External Chaining Table 0 Universe of Keys - U H(k 1) H(k 4) K 1 Actual K 4 Keys – K K 2 K 5 K 3 H(k 2) H(k 3) m-1 H(k 5)

Load Factor • The load factor is a fraction that represents the number of

Load Factor • The load factor is a fraction that represents the number of elements stored in the table divided by the size of the table’s array. – a = the number of elements stored in the table the size of the table’s array

Load Factor – If open addressing is used, then each table slot holds at

Load Factor – If open addressing is used, then each table slot holds at most one element, therefore, the load factor can never be greater than 1. – If external chaining is used, then each table slot can hold many elements, therefore, the load factor may be greater than 1.

Hashing Analysis • The worst case analysis for hashing is the case where every

Hashing Analysis • The worst case analysis for hashing is the case where every key is hashed into the same slot. – Q (n) – linear time. • The average time can be much faster.

Average Search Analysis • Searching with Linear probing. – For a table that is

Average Search Analysis • Searching with Linear probing. – For a table that is not near full: • ½ ( 1 + 1 / (1 – a) ) – For a table that is full or near full: • Math. Sqrt( n ( p / 8) ) • Searching with double hashing. – (-ln (1 – a) ) / a where ‘l’ in ‘ln’ is ‘L’ • Searching with chained hashing. – 1 + (a / 2 ) • See Figure 11. 6 in Main. Page 561

Coding Example • Search Times program that demonstrates Linear, Binary, and Hashing. – The

Coding Example • Search Times program that demonstrates Linear, Binary, and Hashing. – The hashing uses the Hash. Table class.

Hashing • Java provides the Hash. Table class, but it also provides two other

Hashing • Java provides the Hash. Table class, but it also provides two other classes. – The Hash. Map class implements a hash table using a map data structure. – The Hash. Set class implements a hash table using sets.