Symbol Tables n Symbol tables are used by

Symbol Tables n Symbol tables are used by compilers to keep track of information about – – – n variables functions class names type names temporary variables etc. Typical symbol table operations are Insert, Delete and Search – It's a dictionary structure! 1

Symbol Tables n What kind of information is usually stored in a symbol table? – – – n type storage class size scope stack frame offset register We also need a way to keep track of reserved words. 2

Symbol Tables n Where is a symbol table stored? – array/linked list • simple, but linear lookup time • However, we may use a sorted array for reserved words, since they are generally few and known in advance. – balanced tree • O(lgn) lookup time – hash table • most common implementation • O(1) amortized time for dictionary operations 3

Hashing n Hash tables – use array of size m to store elements – given key k (the identifier name), use a function h to compute index h(k) for that key – collisions are possible • two keys hash into the same slot. n Hash functions – A good hash function • is easy to compute • avoids collisions (by breaking up patterns in the keys and uniformly distributing the hash values) 4

Hashing n In the following slides: – k is a key – h(k) is the hash function – m is the size of the hash table – n is the number of keys in the hash table 5

Hashing n What makes a good hash function? – It is easy to compute – It minimizes collisions. • hash = to chop into small pieces (Merriam. Webster) = to chop any patterns in the keys so that the results are uniformly distributed (cs 311) 6

Hashing n When the key is a string, we generally use the ASCII values of its characters in some way: n Examples for k = c 1 c 2 c 3. . . cx – h(k) = (c 1128 x-1+c 2128 x-2+. . . +cx 1280) mod m – h(k) = (c 1+c 2+. . . +cx) mod m – h(k) = (h 1(c 1)+h 2(c 2)+. . . hx(cx)) mod m, where each hi is an independent hash function. 7

Hash functions Truncation n Ignore part of the key and use the remaining part directly as the index. n Example: if the keys are 8 -digit numbers and the hash table has 1000 entries, then the first, fourth and eighth digit could make the hash function. n Not a very good method : does not distribute keys uniformly 8

Hash functions Folding n Break up the key in parts and combine them in some way. n Example : if the keys are 9 digit numbers, break up a key into three 3 -digit numbers and add them up. 9

Hash functions Middle square n Compute k*k and pick some digits from the resulting number. n Example : given a 9 -digit key k, and a hash table of size 1000 pick three digits from the middle of the number k*k. n Works fairly well in practice if the keys do not have many leading or trailing zeroes. 10

Hash functions Division n h(k)=k mod m n Fast n Not all values of m are suitable for this. For example powers of 2 should be avoided because then k mod m is just the least significant digits of k n Good values for m are prime numbers. 11

Hash functions Multiplication n h(k)= m (k c- k c ) , 0<c<1 n In English : – – Multiply the key k by a constant c, 0<c<1 Take the fractional part of k c Multiply that by m Take the floor of the result The value of m does not make a difference n Some values of c work better than others n A good value is n 12

Hash functions Multiplication n Example: Suppose the size of the table, m, is 1301. For k=1234, h(k)=850 For k=1235, h(k)=353 For k=1236, h(k)=115 nice For k=1237, h(k)=660 distribution! For k=1238, h(k)=164 For k=1239, h(k)=968 For k=1240, h(k)=471 13

Hash functions Universal Hashing n Worst-case scenario: The chosen keys all hash to the same slot. This can be avoided if the hash function is not fixed: – Start with a collection of hash functions – Select one at random and use that. – Good performance on average: the probability that the randomly chosen hash function exhibits the worst-case behavior is very low. 14

Load factor n Given a hash table of size m, and n elements stored in it, we define the load factor of the table as =n/m n The load factor gives us an indication of how full the table is. n The possible values of the load factor depend on the method we use for resolving collisions. 15

Resolving collisions: Chaining n Chaining * – Put all the elements that collide in a chain (list) attached to the slot. n The hash table is an array of linked lists n The load factor indicates the average number of elements stored in a chain. It could be less than, equal to, or larger than 1. * a. k. a. closed addressing 16

Resolving collisions: Chaining n Insert/Delete/Lookup time in expected O(1) – Keep the list doubly-linked to facilitate deletions – Worst case of lookup time is linear. n However, this assumes that the chains are kept small. – If the chains start becoming too long, the table must be enlarged and all the keys rehashed. 17

Resolving collisions: Chaining n Assumption: simple uniform hashing – any given key is equally likely to hash into any of the m slots n Analysis of unsuccessful search: – average time to search unsuccessfully for key k = the average time to search to the end of a chain. – The average length of a chain is . – Total (average) time required : (1+ ) 18

Resolving collisions: Chaining n Analysis of successful search: – Expected number e of elements examined during a successful search for key k = one more than the expected number of elements examined when k was inserted. • it makes no difference whether we insert at the beginning or the end of the list. – Take the average, over the n items in the table, of 1 plus the expected length of the chain to which the i th element was added: 19

Resolving collisions: Chaining Total time : (1+ ) 20

Resolving collisions: Chaining types of search take (1+ ) time on average. n If n=O(m), then =O(1) and the total time for Search is O(1) on average n Insert : O(1) in the worst case n Delete : O(1) in the worst case n Both 21

Resolving collisions: Chaining Storage for the elements may be allocated and deallocated within the hash table itself by linking all unused slots into a free list. n Insert: n – if key k hashes into empty slot h(k), put it there and set a flag to indicate that this is the actual position where the element hashed. – if h(k) is not empty, and the element k 1 it contains has its flag set, then use a slot off the free list to store k 1. Its flag should be unset. – if h(k) is not empty, and the element k 1 it contains has its flag unset, then move k 1 to another slot and store k in h(k). 22