Hashing CS 105 100205 Hashing Introduction n n















- Slides: 15
 
	Hashing CS 105 10/02/05
 
	Hashing - Introduction n n In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast Example: Empdata[1000], index = employee ID number n n n search for employee with emp. number = 500 return: Empdata[500] Running Time: O(1) Hashing Slide 10/02/05
 
	Hash table n n Hash table: a data structure, implemented as an array of objects, where the search keys correspond to the array indexes Insert and find operations involve straightforward array accesses: O(1) time complexity Hashing Slide 10/02/05
 
	About hash tables n n In the first example shown, it was relatively easy since employee number is an integer Problem 1: possible integer key values might be too large; creating an appropriate array might be impractical n n Need to map large integer values to smaller array indexes Problem 2: what if the key is a word in the English Alphabet (e. g. last names)? n Need to map names to integers (indexes) Hashing Slide 10/02/05
 
	Large numbers -> small numbers n n Hash function - converts a number from a large range into a number from a smaller range (the range of array indices) Size of array n n Rule of thumb: the array size should be about twice the size of the data set (2 s) for 50, 000 words, use an array of 100, 000 elements Hashing Slide 10/02/05
 
	Hash function and modulo n Simplest hash function - achieved by using the modulo function (returns the remainder) n n for example, 33 % 10 = 3 General formula: Large. Number % Smallrange Hashing Slide 10/02/05
 
	Hash functions for names n Sum of Digits Method n n n map the alphabet A-Z to the numbers 1 to 26 (a=1, b=2, c=3, etc. ) add the total of the letters For example, “cats” n n (c=3, a=1, t=20, s=19) 3+1+20+19=43 ”cats” will be stored using index = 43 Can use modulo operation (%) if you need to map to a smaller array Hashing Slide 10/02/05
 
	Collisions n Problem n n Too many words with the same index “was”, ”tin”, ”give”, ”tend”, ”moan”, ”tick” and several other words add to 43 These are called collisions (case where two different search keys hash to the same index value) Can occur even when dealing with integers n n Suppose the size of the hash table is 100 Keys 158 and 358 hash to the same value when using the modulo hash function Hashing Slide 10/02/05
 
	Collision resolution policy n n Need to know what to do when a collision occurs; i. e. , during an insert operation, what if the array slot is already occupied? Most common policy: go to the next available slot n n “Wrap around” the array if necessary Consequence: when searching, use the hash function but first check whether the element is the one you are looking for. If not try the next slots. n How do you know if the element is not in the array? Hashing Slide 10/02/05
 
	Probe sequence n n n Sequence of indexes that serve as array slots where a key value would map to The first index in the probe sequence is the home position, the value of the hash function. The next indexes are the alternative slots Example: suppose the array size is 10, and the hash function is h(K) = K%10. The probe sequence for K=25 is: n n n 5, 6, 7, 8, 9, 0, 1, 2, 3, 4 Here, we assume the most common collision resolution policy of going to the next slot: p(K, i) = i, Goal: probe sequence should exhaust array slots Hashing Slide 10/02/05
 
	Recap: hash table operations n Insert object Obj with key value K n n home <- h(K) for i <- 0 to M-1 do pos = (home + p(K, i)) % 10 if HT[pos]. get. Key() = K then throw exception “error: duplicate record” // alternative: overwrite else if HT[pos] is null then HT[pos] <- Obj break; Finding an object with key value K n home <- h(K) for i <- 0 to M-1 do pos = (home + p(K, i)) % 10 if HT[pos]. get. Key() = K then return HT[pos] else if HT[pos] is null then throw exception “not found” Hashing Slide 10/02/05
 
	Hash table operations n n Note: although insert and find run in O(1) time during typical conditions, the time complexity in the worst-case is O(n) Something to think about: characterize the worst-case scenarios for insert and find Hashing Slide 10/02/05
 
	Removing elements n n Removing an element from a hash table during a delete operation poses a problem If we set the corresponding hash table entry to null, then succeeding find operations might not work properly n n Recall that for the find algorithm, seeing a null means a target element is not found but in fact the element might be in a next slot Solution: tombstone n n Arrange it so that deleted entries seem null when inserting, but don’t seem null when searching Requires a simple flag on the objects stored Hashing Slide 10/02/05
 
	Hash tables in Java n n java. util. Hashtable Important methods for the Hashtable class n n put(Object key, Object entry) Object get(Object key) remove(Object key) boolean contains. Key(Object key) Hashing Slide 10/02/05
 
	Summary n Hash tables implement the dictionary data structure and enable O(1) insert, find, and remove operations n n Requires a hash function (maps keys to array indices) and a collision resolution policy n n Caveat: O(n) in the worst-case because of the possibility of collisions Probe sequence depicts a sequence of array slots that an object would occupy, given its key In Java: use the Hashtable class Hashing Slide 10/02/05
