Hash Functions Andy Wang Data Structures Algorithms and
- Slides: 22
Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming
Introduction l Hash function – Maps keys to integers (buckets) l Hash(Key) = Integer – Ideally in a random-like manner Evenly distributed bucket values l Even if the input data is not evenly distributed l
An Example l ID Number Generation – Key = your name – Hash(Key) = a number l Not a great hash function… – Two people with the same name will have the same number…
Simple Hash Functions l Assumptions: – K: an unsigned 32 -bit integer – M: the number of buckets (the number of entries in a hash table) l Goal: – If a bit is changed in K, all bits are equally likely to change for Hash(K)
A Simple Hash Function… l What if K = M? l Hash(K) = K l What is wrong? l Your student ID = SSN – I can’t use your SSN to post your grades…
Another Simple Function l If K>M l Hash(K) = K % M l What is wrong? l Suppose M = 4, K = 2, 4, 6, 8 l K % M = 2, 0, 2, 0
Yet Another Simple Function l If K > P, P = prime number l Hash(K) = K % P l Suppose P = 3, K = 2, 4, 6, 8 l K % P = 2, 1, 0, 3 l More uniform distribution…but still problematic for other cases
More on Prime Numbers l. K > P 1 > P 2, P 1 and P 2 are prime numbers l Hash(K) = (K % P 1) % P 2 l Suppose P 1 = 5, P 2 = 3, K = 2, 4, 6, 8, 10 l (K % 5) = 2, 4, 1, 3, 0 l (K % 5) % 3 = 2, 1, 1, 0, 0 l Still uniform distribution
Polynomial Functions l If K > P, P = prime number l Hash(K) = K(K + 3) % P l Slightly better than pure modulo functions
How About… l Hash(K) = rand() l What is wrong? l Not repeatable
How About… l. K > P, P = prime number l Hash(K) = rand(K) % P l Better randomness l Can be expensive to compute random numbers
Pre-generated Randomness l Two prime numbers: P 1 and P 2 l K > P 1 and K > P 2 l A table R[P 1], with R[i] pre-initialized to rand(i) % P 2 l Hash(K) = R[K % P 1] l Slight Problem: Possible duplicate mapping
To Avoid Duplicate Mapping… l Two prime numbers: P 1 and P 2 l K > P 1 and K > P 2 l A table R[P 1], with R[i] pre-initialized to unique random numbers l Hash(K) = R[K % P 1]
An Example l. K = 0… 232, P 1 = 3, P 2 = 5 l R[3] = {0, 4, 1} l Hash(K) = R[K % 3]
Hashing a Sequence of Keys l. K = {K 1, K 2, …, Kn) l E. g. , Hash(“test”) = 98157 l Design Principles – Use the entire key – Use the ordering information – Use pre-generated randomness
Use the Entire Key unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] } return hash; } l Problem: Hash(“ab”) == Hash(“ba”)
Use the Ordering Information unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] hash = /* hash with some shiftings */ } return hash; } l Problem: H(short keys) will not perturb all 32 -bits (clustering)
Use Pre-generated Randomness unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ R[Key[j]] hash = /* hash with some shiftings */ } return hash; }
CRC Variant l Do 5 -bit circular shift of hash l XOR hash and K[j] … for (…) { highorder = hash = hash } … hash & 0 xf 8000000; << 5; ^ (highorder >> 27) ^ K[j];
CRC Variant + For long keys, all 32 -bits are exercised + More randomness toward lower bits - Not all bits are changed for short keys
BUZ Hash l Set up an array R to store precomputed random numbers … for (…) { highorder = hash = hash } … hash & 0 x 80000000; << 1; ^ (highorder >> 31) ^ R[K[j]];
References l Aho, Sethi, and Ullman. Compilers: Principles, Techniques, and Tools, 1986. l Cormen, Leiserson, River. Introduction to Algorithms, 1990 l Knuth. The Art of Computer Programming, 1973 l Kuenning. Hash Functions, 2003.
- Tema de hash hash
- Algoritmo abcde
- Ajit diwan
- Kevin wayne princeton
- Data structures and algorithms tutorial
- Information retrieval data structures and algorithms
- Data structures and algorithms bits pilani
- Ajit diwan iit bombay
- Data structures and algorithms
- Data structures and algorithms
- Waterloo data structures and algorithms
- Signature file structure in information retrieval system
- Data structures and algorithms
- Algorithms + data structures = programs
- Andy wang fsu
- Andy wang fsu
- Two simple hash functions
- Hash functions
- Hash functions
- Hash functions
- Hash functions
- Give other examples of homologous structures
- Is a hash table an abstract data type