dsauet weebly com Lecture 7 Collision Resolution Techniques

dsauet. weebly. com Lecture : 7 Collision Resolution Techniques Azeem Iqbal University of Engineering and Technology, Lahore (Faisalabad Campus) ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 1

What is a Collision Ä Hash functions are used to map each key to a different address space, but practically it is not possible to create such a hash function and the problem is called collision. Ä Collision is the condition where two records are stored in the same location. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 2

Collision Resolution Techniques • The process of finding an alternate location is called collision resolution. • Even though hash tables have collision problems, they are more efficient in many cases compared to all other data structures, like search trees. • There a number of collision resolution techniques, and the most popular are direct chaining and open addressing. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 3

Collision Resolution Techniques • Direct Chaining: An array of linked list application • Separate chaining • Open Addressing: Array-based implementation • Linear probing (linear search) • Quadratic probing (nonlinear search) • Double hashing (use two hash functions) ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 4

Separate Chaining • Collision resolution by chaining combines linked representation with hash table. • When two or more records hash to the same location, these records are constituted into a singly-linked list called a chain. • The idea is to make each cell of hash table point to a linked list of records that have same hash function value. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 5

Separate Chaining Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101. 0 0 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 Initial Empty Hash Table ©AZEEM IQBAL 50 Insert 50 Data Structures and Algorithms - Spring 2019 0 700 1 50 76 Insert 700 & 76 6

Separate Chaining 50, 700, 76, 85, 92, 73, 101. Insert 92: Collision occurs, add to chain 0 700 1 50 85 0 700 1 50 2 2 3 3 4 4 5 5 6 76 6 85 92 76 Insert 85: Collision occurs, add to chain ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 7

Separate Chaining 50, 700, 76, 85, 92, 73, 101. 0 700 1 50 92 700 1 50 85 73 101 73 3 4 4 5 5 6 92 2 2 3 85 0 76 6 76 Insert 101 Insert 73 ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 8

Separate Chaining Advantages: 1) Simple to implement. 2) Hash table never fills up, we can always add more elements to chain. Disadvantages: 1) Wastage of Space (Some Parts of hash table are never used) 2) If the chain becomes long, then search time can become O(n) in worst case. 3) Uses extra space for links. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 9

Open Addressing • In open addressing all keys are stored in the hash table itself. • This approach is also known as closed hashing. • This procedure is based on probing. • A collision is resolved by probing. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 10

Open Addressing - Linear Probing • The interval between probes is fixed at 1. • In linear probing, we search the hash table sequentially, starting from the original hash location. • If a location is occupied, we check the next location. • We wrap around from the last table location to the first table location if necessary. • The function for rehashing is the following: ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 11

Linear Probing 1. If slot hash(x) % tablesize is full, then we try (hash(x) + 1) % tablesize 2. If (hash(x) + 1) % tablesize is also full, then we try (hash(x) + 2) % tablesize 3. If (hash(x) + 2) % tablesize is also full, then we try (hash(x) + 3) % tablesize 4. And so on until you find the empty slot. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 12

Linear Probing Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101. 0 0 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 Initial Empty Hash Table ©AZEEM IQBAL 50 Insert 50 Data Structures and Algorithms - Spring 2019 0 700 1 50 76 Insert 700 & 76 13

Linear Probing 50, 700, 76, 85, 92, 73, 101. 0 700 1 50 2 85 Insert 85: Collision occurs, insert 85 at the next free slot 3 4 5 6 76 ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 14

Linear Probing 50, 700, 76, 85, 92, 73, 101. 0 700 1 50 2 85 3 92 Insert 92: Collision occurs as 50 is there at index 1. Insert at next free slot 4 5 6 76 ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 15

Linear Probing 50, 700, 76, 85, 92, 73, 101. 0 700 1 50 2 85 3 92 4 5 73 101 6 76 ©AZEEM IQBAL Insert 73 and 101: Data Structures and Algorithms - Spring 2019 16

Linear Probing • One of the problems with linear probing is that table items tend to cluster together in the hash table. • This means that the table contains groups of consecutively occupied locations that are called primary clustering. • Clusters can get close to one another, and merge into a larger cluster. • Thus, the one part of the table might be quite dense, even though another part has relatively few items. • Clustering causes long probe searches and therefore decreases the overall efficiency. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 17

Linear Probing Insertion The insertion algorithm is as follows: § Use hash function to find index for a record § If that spot is already in use, we use next available spot in a "higher" index. § Treat the hash table as if it is round, if you hit the end of the hash table, go back to the front Ø Each contiguous group of records (groups of record in adjacent indices without any empty spots) in the table is called a cluster. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 18

Linear Probing Searching The search algorithm is as follows: § Use hash function to find index of where an item should be. § If it isn't there search records after that hash location (remember to treat table as circular) until either it found, or until an empty record is found. If there is an empty spot in the table before record is found, it means that the record is not there. Ø NOTE: It is important not to search the whole array till you get back to the starting index. As soon as you see an empty spot, your search needs to stop. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 19

Removal Linear Probing The removal algorithm is a bit trickier because after an object is removed, records in same cluster with a higher index than the removed object has to be adjusted. Otherwise the empty spot left by the removal will cause valid searches to fail. The algorithm is as follows: § Find record and remove it making the spot empty § For all records that follow it in the cluster, do the following: o Determine the hash index of the record o Determine if empty spot is between current location of record and the hash index. o Move record to empty spot if it is, the record's location is now the empty spot. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 20

Linear Probing Removal (More Efficient Algorithm) 0 1 0 Delete(D): 1 B 2 D H(x) 4 B 2 3 A D 4 D R 5 R 3 A 4 5 6 6 ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 21

Linear Probing • Unfortunately, this has a negative side-effect on the way the search operation works. • Since the data retrieval operation relies on blank hash elements as the signal to stop probing, there is the possibility that a deletion operation will make some data items unfindable. • Consider where a search for 'R' (which has the same hash code as 'A') is attempted, after 'D' has been deleted: • The data 'R' will never be found, as the probing had terminated too early; this is due to the hash element that stored 'D' (and kept the probing going) being deleted. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 0 1 B 2 3 A 4 D 5 R 6 22

Linear Probing The solution to this problem is to define two different kinds of blank hash elements: 0 1 • Purely empty element, which has never stored data; and • Empty but deleted element, which stored data that has since been deleted. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 B 2 3 A 4 D 5 R 6 23

Linear Probing Ø These can be used to differentiate the situations in how a clear hash element came to exist; something that will be necessary to make the hash search work again. Ø When a data item is deleted, it is not completely cleared, but instead has the "empty but deleted" mark. The search function must then be modified so that it will terminate probing only on a purely empty element, and continue probing if an "empty but deleted" element is encountered. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 0 1 B 2 3 A 4 empty but deleted 5 R 6 24

Linear Probing An add operation can store data in the "empty but deleted" element. As the deleted flag is only necessary to continue searching, adding data to one of these elements makes it work like just another normal element again (as far as the probing algorithm is concerned. ) 0 1 B 2 3 A 4 empty but deleted 5 R 6 ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 25

Open Addressing - Quadratic Probing ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019

Quadratic Probing – Example 0 Insert 48 48%7=6 0 48 Insert 55 5%7=5 55%7=6 0 48 1 1 2 2 3 3 5 3 55 4 4 4 5 5 40 Insert 76 76%7=6 Insert 40 40%7=5 0 6 Probes 76 1 ©AZEEM IQBAL 6 76 1 6 76 6 2 Data Structures and Algorithms - Spring 2019 1 5 76 3 2 6 76 3 27

Quadratic Probing – Example Insert 76 76%7=6 Insert 93 93%7=2 Insert 40 40%7=5 0 0 0 Insert 35 Insert 47 35%7=0 47%7=5 0 35 1 1 1 2 2 93 3 4 4 4 5 5 5 40 6 76 5 40 6 Probes 76 1 ©AZEEM IQBAL 6 76 1 Data Structures and Algorithms - Spring 2019 76 3 6 76 3 28

Quadratic Probing • With linear probing we know that we will always find an open spot if one exists (It might be a long search but we will find it). • However, this is not the case with quadratic probing unless you take care in the choosing of the table size. • In order to guarantee that your quadratic probes will hit every single available spots eventually, your table size must meet these requirements: Ø Be a prime number Ø never be more than half full (even by one element) ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 29

Quadratic Probing • Limitation: at most half of the table can be used as alternative locations to resolve collisions. • This means that once the table is more than half full, it's difficult to find an empty spot. This new problem is known as secondary clustering because elements that hash to the same hash key will always probe the same alternative cells. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 30

Double Hashing • Double Hashing is works on a similar idea to linear and quadratic probing. • Use a big table and hash into it. Whenever a collision occurs, choose another spot in table to put the value. • The difference here is that instead of choosing next opening, a second hash function is used to determine the location of the next spot. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 31

Double Hashing For example, given hash function H 1 and H 2 and key. do the following: Ø Check location hash 1(key). If it is empty, put record in it. Ø If it is not empty calculate hash 2(key). Ø Check if hash 1(key)+hash 2(key) is open, if it is, put it in Ø Repeat with hash 1(key)+2 hash 2(key), hash 1(key)+3 hash 2(key) and so on, until an Note: Youismust take care in choosing hash 2 opening found. CANNOT ever return 0. hash 2 must be done so that all cells will be probed eventually. ©AZEEM IQBAL Data Structures and Algorithms - Spring 2019 32

Double Hashing -Example One good choice is to choose a Prime No R < Size and: Hash 2(x) = R – (x mod R) Insert 47 Insert 76 76%7=6 Insert 93 93%7=2 Insert 40 47%7=5 40%7=5 5 - (47%5)=3 0 0 1 1 2 2 93 47 2 93 3 3 4 4 5 5 5 40 6 76 Probes 1 ©AZEEM IQBAL 6 76 1 Data Structures and Algorithms - Spring 2019 6 76 2 33

Double Hashing -Example Insert 10 10%7=3 Insert 55 55%7=6 5 - (55%5)=5 0 0 47 2 93 3 10 1 47 1 2 93 4 3 10 4 55 5 40 6 76 Probes ©AZEEM IQBAL 1 6 76 2 Data Structures and Algorithms - Spring 2019 34