Hash Tables Rick Mercer Hash Tables A faster
Hash Tables © Rick Mercer
Hash Tables A "faster" implementation of a Map w Outline ¾ Discuss what a hash method does · ¾ translates a string key into an integer Discuss a few strategies for implementing a hash table · · · linear probing quadratic probing separate chaining hashing
Big Oh Complexity for various Map Implementations Data Structure Unsorted Array Sorted Array Unsorted Linked List Sorted Linked List Binary Search Tree put get Delete
Hash Tables w Hash table: another structure for storing data ¾ Provides virtually direct access to objects based on a key (a unique String or Integer) · · · key could be your SID, your telephone number, social security number, account number, … keys must be unique Each key is associated with (mapped to) an object
Hashing w Must convert keys such as "555 -1234" into an integer index from 0 to some reasonable size w Elements can be found, inserted, and removed using the integer index as an array index w Insert (called put), find (get), and remove must use the same "address calculator" ¾ which we call the Hash function w Good structure for implementing a dictionary
Hashing w Can make a String or Integer object into a key by "hashing" the key to get an int w Ideally, every key has a unique hash value ¾ Then the hash value could be used as an array index, however, · · · you cannot rely on every key "hashing" to a unique integer but can usually get close enough Still need a way to handle "collisions" "abc" may hash to the same int as "cba" – if a lousy hash function is used
Hash Tables: Runtime Efficient w Lookup time does not grow when n increases w A hash table supports ¾ ¾ ¾ fast retrieval O(1) fast deletion O(1) fast insertion O(1) w Could use String keys each ASCII character equals some unique integer ¾ "able" = 97 + 98 + 101 == 404
Hash method works something like this Convert a String key into an integer that will be in the range of 0 through the maximum capacity-1 Assume the array capacity is 9997 hash(key) AAAA 8482 1273 zzzz hash(key) A string of 8 chars Range: 0. . . 9996
Hash method w What if the ASCII value of individual chars of the string key added up to a number from ("A") 65 to possibly 488 ("zzzz") 4 chars max w If the array has size = 309, mod the sum 390 % TABLE_SIZE = 81 394 % TABLE_SIZE = 85 404 % TABLE_SIZE = 95 w These array indices store these keys 81 85 95 abba abcd able
A terrible hash method @Test public void test. Hash() { assert. Equals(81, hash("abba")); assert. Equals(81, hash("baab")); assert. Equals(85, hash("abcd")); assert. Equals(86, hash("abce")); assert. Equals(308, hash("IKLT")); assert. Equals(308, hash("KLMP")); } private final int TABLE_SIZE = 309; public int hash(String key) { // return an int in the range of 0. . TABLE_SIZE-1 int result = 0; int n = key. length(); for (int j = 0; j < n; j++) result += key. char. At(j); // add up the characters return result % TABLE_SIZE; }
Collisions w A good hash method executes quickly ¾ distributes keys equitably w But you still have to handle collisions when two keys have the same hash value ¾ the hash method is not guaranteed to return a unique integer for each key ¾ · example: simple hash method with "baab" and "abba" w There are several ways to handle collisions ¾ let us first examine linear probing
Linear Probing Dealing with Collisions w Collision: When an element to be inserted hashes out to be stored in an array position that is already occupied. w Linear probing: search sequentially for an unoccupied position use wraparound
A hash table after three insertions using the too simple hash code method insert objects with these three keys: "abba" "abcd" "abce" 0. . . 80 81 82 83 84 85 86. . . 308 Keys "abba" "abcd" "abce"
Collision occurs while inserting "baab" can't insert "baab" where it hashes to same slot as "abba" Linear probe forward by 1, inserting it at the next available slot 0. . . 80 81 82 83 84 85 86. . . 308 "abba" "baab" "abcd" "abce" "baab" Try [81] Put in [82]
Wrap around when collision occurs at end Insert "KLMP" "IKLT" both of which have a hash value of 308 0. . . 80 81 82 83 84 85 86. . . 308 "IKLT" "abba" "baab" "abcd" "abce" "KLMP"
Find object with key "baab" still hashes to 81, but since [81] does not hold it, linear probe to [82] At this point, you could return a reference to it or remove it 0. . . 80 81 82 83 84 85 86. . . 308 "IKLT" "abba" "baab" "abcd" "abce" "KLMP"
Hash. Map put with linear probing public class Hash. Table<K, V> { private class Hash. Table. Node { private K key; private V data; boolean active; private Hash. Table. Node() { key = null; data = null; active = false; } } private final static int TABLE_SIZE = 11; private Object[] table; public Hash. Table() { table = new Object[TABLE_SIZE]; for (int i = 0; i < TABLE_SIZE; j++) { table[i] = new Hash. Table. Node(); } }
Find and Remove an element w Follow the same path to find an item ¾ If linear search finds an empty hash table slot, the item could not have been found the search is done w To remove an element, follow the same path ¾ If found, mark the element deleted somehow w Three possible states when looking at slots ¾ ¾ ¾ the slot was never occupied the slot is occupied if match stop or proceed to next the slot was occupied, but nothing there now removed · We could call this a tomb. Stoned slot
Linear Probe Implementation w Could have a linear probing, array based, implementation ¾ ¾ perhaps set all nodes to null at first and make sure you set the node to null after removing or perhaps each array element will have an object that stores a reference to the data along with some boolean instance variables to indicate which of the three states it is in active, avail, or Tomb. Stoned -- to allow linear probes past removed elements w Or could have a linear probing, linked list implementation no wraparound (later)
Array based implementation has Clustering Problem w Used slots tend to cluster with linear probing
Quadratic Probing w Quadratic probing eliminates the primary clustering problem w Assume h. Val is the value of the hash function ¾ Instead of linear probing which searches for an open slot in a linear fashion like this h. Val + 1, h. Val + 2, h. Val + 3, h. Val + 4, . . . ¾ add index values in increments of h. Val + 12, h. Val + 22, h. Val + 32, h. Val + 42, . . .
Does it work? w Quadratic probing works if ¾ the table size is prime · ¾ and the table is never more than half full · ¾ studies show the prime numbered table size removes some of the non-randomness of hash functions probes 1, 4, 9, 16, 32, 64, 128, . . . slots away So make your table twice as big as you need · · insert, find, remove are O(1) 4*n bytes additional memory required for unused array locations
Separate Chaining Hashing w Separate Chaining Hashing is an alternative to probing w Maintain an array of linked lists w Hash to the same place always and insert at the beginning (or end) of the linked list. ¾ ¾ The linked list needs add and remove methods Or could use java. util. Linked. List
An Array of Linked. List Objects Implementation w An array of linked lists 0 1 2 321 365
Hash. Table<K, V> public class Hash. Table { private class Hash. Table. Node { private String key; private Object data; private boolean active; } private final static int TABLE_SIZE = 11; private Hash. Table. Node[] table; public Hash. Table() { table = new Hash. Table. Node[TABLE_SIZE]; for (int j = 0; j < TABLE_SIZE; j++) { table[j] = new Hash. Table. Node(); table[j]. active = false; } } // Same hash method discussed in class // that sums the char values in the key private int hash(String key) { // return 0. . TABLE_SIZE-1
Put is Easy // Precondition: x has The object is unique public void put(K key, V x) { Hash. Node hn = new Hash. Node(key, x); int pos = hash((String)key); table[pos]. insert. Element. At(0, hn); } // return an int in the range of 0. . TABLE_SIZE-1 public int hash(String key) { int result = 0; int n = key. length(); for (int j = 0; j < n; j++) result += key. char. At(j); // add up the chars return result % TABLE_SIZE; }
Insert Six Objects @Test public void test. Put. And. Get() { My. Hash. Table<String, Bank. Account> h = new My. Hash. Table<String, Bank. Account>(); Bank. Account a 1 = new Bank. Account("abba", 100. 00); Bank. Account a 2 = new Bank. Account("abcd", 200. 00); Bank. Account a 3 = new Bank. Account("abce", 300. 00); Bank. Account a 4 = new Bank. Account("baab", 400. 00); Bank. Account a 5 = new Bank. Account("KLMP", 500. 00); Bank. Account a 6 = new Bank. Account("IKLT", 600. 00); // Insert Bank. Account objects using ID as the key h. put(a 1. get. ID(), a 1); h. put(a 2. get. ID(), a 2); h. put(a 3. get. ID(), a 3); h. put(a 4. get. ID(), a 4); h. put(a 5. get. ID(), a 5); h. put(a 6. get. ID(), a 6); System. out. println(h. to. String()); }
The output when TABLE_SIZE==11 0. [IKLT=IKLT $600. 00, KLMP=KLMP $500. 00] 1. [] 2. [] 3. [] 4. [] 5. [baab=baab $400. 00, abba=abba $100. 00] 6. [] 7. [] 8. [] 9. [abcd=abcd $200. 00] 10. [abce=abce $300. 00]
A Better Hash method Use the key's hash. Code method see java. util. String. hash. Code // return an int in the range of 0. . TABLE_SIZE-1 public int hash(K key) { return key. hash. Code() % TABLE_SIZE; }
With a better hash method Collisions still happen 0. [IKLT=IKLT 1. [abba=abba 2. [abcd=abcd 3. [baab=baab 4. [KLMP=KLMP 5. [] 6. [] 7. [] 8. [] 9. [] 10. [] $600. 00] $100. 00] $200. 00] $400. 00, abce=abce $300. 00] $500. 00]
Experiment // Rick's linear probing implementation // Array size was 75, 007, using Weiss's hash Time to construct an empty hashtable: 0. 161 seconds Time to build table of 50000 entries: 0. 65 seconds Time to lookup each table entry once: 0. 19 seconds // 8000 Time to arrays of Linked lists, using construct an empty hashtable: build table of 50000 entries: lookup each table entry once: Weiss's hash 0. 04 seconds 0. 741 seconds 0. 281 seconds // Java's Hash. Map Time to construct an empty hashtable: 0. 0 seconds Time to build table of 50000 entries: 0. 691 seconds Time to lookup each table entry once: 0. 11 seconds
Runtimes w What are the runtimes in big-O for the linear probing with array for these methods? ¾ get _____ ¾ put ______ ¾ remove _______
Hash Table Summary ¨Hashing involves transforming data to produce an integer in a fixed range ¨The function that transforms the key into an array index is known as the hash function ¨When two data values produce the same hash value, you get a collision ¨Collision resolution may be done by searching for the next open slot at or after the position given by the hash function, wrapping around to the front of the table when you run off the end (known as linear probing)
Hash Table Summary ¨Another common collision resolution technique is to store the table as an array of linked lists and to keep at each array index the list of values that yield that hash value known as separate chaining ¨Most often the data stored in a hash table includes both a key field and a data field (e. g. , social security number and student information). The key field is used to determine where to store the data in the hash table. A lookup on that key will then return the data associated with that key if it is stored in the table
- Slides: 34