Introduction to Hashing Hash Functions Sections 5 1

  • Slides: 12
Download presentation
Introduction to Hashing Hash Functions • Sections 5. 1, 5. 2, and 5. 6

Introduction to Hashing Hash Functions • Sections 5. 1, 5. 2, and 5. 6 1

Hashing • Data items stored in an array of some fixed size – Hash

Hashing • Data items stored in an array of some fixed size – Hash table • Search performed using some part of the data item – key • Used for performing insertions, deletions, and finds in constant average time • Operations requiring ordering information not supported efficiently – Such as find. Min, find. Max 2

Hash Table Example 3

Hash Table Example 3

Hash Table Applications • Comparing search efficiency of different data structures: – Vector, list:

Hash Table Applications • Comparing search efficiency of different data structures: – Vector, list: O(N) – AVL search tree: O(log(N)) – Hash table: O(1) expected time • Compilers to keep track of declared variables – Symbol tables – Mapping from name to id • On-line spelling checkers 4

Hash Functions • Map keys to integers (which represent table indices) – Hash(Key) =

Hash Functions • Map keys to integers (which represent table indices) – Hash(Key) = Integer – Evenly distributed index values • Even if the input data is not evenly distributed • What happens if multiple keys mapped to the same integer (same position)? – Collision management (discussed in detail later) – Collisions are likely to be reduced if keys are evenly distributed over the hash table 5

Simple Hash Functions • Assumptions: – K: an unsigned 32 -bit integer – M:

Simple Hash Functions • Assumptions: – K: an unsigned 32 -bit integer – M: the number of buckets (the number of entries in a hash table) • Goal: – If a bit is changed in K, all bits are equally likely to change for Hash(K) – So that items are evenly distributed in the hash table 6

A Simple Function • What if – Hash(K) = K % M – Where

A Simple Function • What if – Hash(K) = K % M – Where M is of any integer value • What is wrong? • Values of K may not be evenly distributed – But Hash(K) needs to be evenly distributed • Suppose – M = 10, – K = 10, 20, 30, 40 • Then K % M = 0, 0, 0… 7

Another Simple Function • If – Hash(K) = K % P, P = prime

Another Simple Function • If – Hash(K) = K % P, P = prime number • Suppose – P = 11 – K = 10, 20, 30, 40 • K % P = 10, 9, 8, 7 • More uniform distribution… • So hash tables often have prime number of entries 8

A Simple Hash for Strings unsigned int Hash(const string& Key) { unsigned int hash

A Simple Hash for Strings unsigned int Hash(const string& Key) { unsigned int hash = 0; for (int j = 0; j != Key. size(); ++j) { hash += Key[j] } return hash; } • Problem: Small sized keys may not use a large fraction of a large hash table 9

Another Simple Hash Function unsigned int Hash(const string& Key) { return Key[0] + 27*Key[1]

Another Simple Hash Function unsigned int Hash(const string& Key) { return Key[0] + 27*Key[1] + 729*Key[2]; } • Problem: English does not use random strings; so, the hash values are not uniformly distributed – Using more characters of the key can improve the hash function 10

A Better Hash Function unsigned int Hash(const string &Key) { unsigned int hash =

A Better Hash Function unsigned int Hash(const string &Key) { unsigned int hash = 0; for (int j = 0; j != Key. size(); ++j) hash = 37*hash + (Key[j]-’a’+1); return hash%Table. Size; } • The for loop computes ai 37 n-i using Horner’s rule, where ai has the value 1 for ‘a’, 2 for ‘b’, etc – a 3 + 37 a 2 + 372 a 1 + 373 a 0 = 37(37(37 a 0 + a 1)+ a 2) + a 3 • The for loop implicitly performs arithmetic modulo 2 k, where k is the number of bits in an unisigned int 11

STL Hash Tables • STL extensions – hash_set – hash_map • The key type,

STL Hash Tables • STL extensions – hash_set – hash_map • The key type, hash function, and equality operator may need to be provided • Available in new standard as unordered set and map – <tr 1/unordered_map> or <unordered_map> – <trl/unordered_set> or <unordered_set> • Example: Lec 24/hashmapex. cpp – Reference • www. open-std. org/jtc 1/sc 22/wg 21/docs/papers/2003/n 1456. html 12