Hashtables Hashtables An Abstract data type that supports

Hashtables

Hashtables • An Abstract data type that supports the following operations: – Insert – Find – Remove • Search trees can be used for the same operations but require an order relation to be defined an logarithmic time. • Hashtables do not require an order relationship on the elements and all operations take O(1) time on average.

Direct Access Tables • Assume that the keys are distinct numbers in the range U = {1, 2, 3…. m}, use an array of size m and place the kth element in the kth index of the array. • O(1) time for all operations • Problem: wasteful for small sets and impractical if m is very large

Hashtables • Main Idea: instead of using the keys themselves as index in the table, use a hash function for mapping keys to indices. • Note U is the set representing all possible keys, it is therefore usually much larger than m.

Simple Uniform Hashing • We assume that we use a hash function that given an key, will hash the key into any slot with equal probability. • We will try to provide some reasonable hash functions later

hash functions • The hash function is responsible to map keys into integers (slot numbers). A good hash function must have the following properties – 1. Easy to evaluate - computing h(x) in O(1) – 2. Uniform distribution over all the table slots – 3. Similar keys will be mapped to different slots

hash functions • The first step is to represent the key as a natural integer number. • For example if S is a String then we can compute the interpret it as an integer value using the formula

Collisions • Mapping keys to indices can cause collisions if to keys are mapped by the hash function to the same index • Solutions – Chaining – Open addressing

Collision resolution - Chaining • All keys that have the same hash value are placed in a linked list • Insertion can be done at the beginning of the list in O(1) time • Searching is proportional to the length of the list

Collision resolution by chaining • Let h be a hash table of 9 slots and h(k) = k mod 9, insert the elements : 6, 43, 23, 62, 1, 13, 34, 55, 25 h(6) = 6 mod 9 = 6 h(43) = 43 mod 9 = 7 h(23) = 23 mod 9 = 5 h(62) = 62 mod 9 = 8 h(1) = 1 mod 9 = 1 h(13) = 13 mod 9 = 4 h(34) = 34 mod 9 = 7 h(55) = 55 mod 9 = 1 h(25) = 25 mod 9 = 7

Analysis • The load factor of a hashtable is defined by the number of elements stored in the table divided by the number of slots • An search will take under the assumption of uniform hashing

Division method • An appropriate hash function for a hashtable that uses chaining is the division method. • Powers of 10 and 2 should be avoided • Good values are primes not close to powers of 2

Open Addressing • Each element occupies a single slot in the hashtable. No chaining is done • To insert an element, we probe the table according to the hash function until an empty slot is found. • The hash function is now a function of both the key and the number of attempts in the insertion process

Hash Insert • Hash. Insert (T, k) { int i; for (i = 0; i < m; i++) { j = h(k, i) if (T[j] == null) break; } if (i < m) T[j] = k else hashtable overflow }

Hash Search • Hash. Search (T, k) { int i; for (int i = 0; i < m; i++) { j = h(k, i) if (T[j] == null) return not found else if (T[j] ==k) return j } }

Linear probing • Using linear probing the hash function uses an ordinary hash function h’, such as a function using the division method, and turns it into: • If a slot is occupied, we try the subsequent slot, etc. , thus the initial slot determines the probing sequence for insertion and search.

Linear Probing • Easy to implement but suffers from primary clustering. • The probability of probing into a slot following an occupied slot is greater than the probability of any other slot.

Linear Probing • Given a hash function h’, the linear probing scheme is simply

Exercise • You are given a hash table h with 11 slots. Demonstrate inserting the following elements using linear probing and a hash function h(k) = k mod m – 10, 22, 31, 4, 15, 28, 17, 88, 59

Solution • • • h(10, 0) = (10 mod 11 + 0) mod 11 = 10 h(22, 0) = (22 mod 11 + 0) mod 11 = 0 h(31, 0) = (31 mod 11 + 0) mod 11 = 9 h(4, 0) = (4 mod 11 + 0) mod 11 = 4 h(15, 0) = (15 mod 11 + 0) mod 11 = 4 h(15, 1) = (15 mod 11 + 0) mod 11 = 5 h(28, 0) = (28 mod 11 +1) mod 11 = 6 h(17, 0) = (17 mod 11 + 0) mod 11 = 6 h(17, 1) = (17 mod 11 + 1) mod 11 = 7 0 1 2 22 88 3 4 4 • h(88, 0) = • h(88, 1) = • h(59, 0) = • h(59, 1) = • h(59, 2) = • h(59, 3) = • h(59, 4) = (88 mod 11 + 0) mod 11 = 10 (88 mod 11 +1) mod 11 = 1 (59 mod 11 + 0) mod 11 = 4 (59 mod 11 + 1) mod 11 = 5 (59 mod 11 + 2) mod 11 = 6 (59 mod 11 + 3) mod 11 = 7 (59 mod 11 + 4) mod 11 = 8 5 6 7 8 9 10 15 28 17 59 31 10

Quadric Probing • Using quadratic probing the has function again uses an initial hash function h’, and is now • Choosing a subsequent slot once a slot is full depends on the probe number i. • Quadric probing involves a secondary form of clustering since only the initial probe determines the entire probing sequence,

Quadric Probing • Given a hash function h’ quadric probing is done by:

Example • You are given a hash table h with 11 slots. Demonstrate inserting the following elements using quadric probing and a hash function – 10, 22, 31, 4, 15, 28, 17, 88, 59

0 1 22 • • • • 2 3 4 88 17 4 5 h(10, 0) = (10 mod 11 + 0) mod 11 = 10 h(22, 0) = (22 mod 11 + 0) mod 11 = 0 h(31, 0) = (31 mod 11 + 0) mod 11 = 9 h(4, 0) = (4 mod 11 + 0) mod 11 = 4 h(15, 0) = (15 mod 11 + 0) mod 11 = 4 h(15, 1) = (15 mod 11 + 3) mod 11 = 8 h(28, 0) = (28 mod 11 +1) mod 11 = 6 h(17, 0) = (17 mod 11 + 0) mod 11 = 6 h(17, 1) = (17 mod 11 + 3) mod 11 = 10 h(17, 2) = (17 mod 11 + 2 + 12) mod 11 = 9 h(17, 3) = (17 mod 11 + 3 + 27) mod 11 = 3 h(88, 0) = (88 mod 11 + 0) mod 11 = 0 h(88, 1) = (88 mod 11 + 3) mod 11 = 4 h(88, 2) = (88 mod 11 + 2 + 12) mod 11 = 3 6 7 8 9 10 28 59 15 31 10 • h(88, 3) = • h(88, 4) = • h(88, 5) = • h(88, 6) = • h(88, 7) = • h(88, 8) = • h(59, 0) = • h(59, 1) = • h(59, 2) = (88 mod 11+ 3+ 27) mod 11 = 8 (88 mod 11+ 4+ 48) mod 11 = 8 (88 mod 11+ 5+ 75) mod 11 = 3 (88 mod 11+ 6+ 108) mod 11 = 4 (88 mod 11+ 7+ 147) mod 11 = 0 (88 mod 11+ 8+ 192) mod 11 = 2 (59 mod 11 + 0) mod 11 = 4 (59 mod 11 + 3) mod 11 = 8 (59 mod 11 + 12) mod 11 = 7

Double Hashing • Given two hash functions • Problem should not have any common divisors.

Double Hashing • Example 1: select m to be a power of 2, and design to produce odd numbers. • Example 2: select m to be prime, and m’ to be m-1.

Analysis • In open addressing the load factor can not be more than 1. • Insertion and unsuccessful searching requires at most attempts • A successful search will take at most

Analysis • When the table is 50% full, searching will require 1. 387 probes on average • When the table is 90% full, searching will require 2. 599 probes on average

Problems with open addressing • If an element is deleted, we can not simply remove the element, since later search operations may fail. Rehashing will ruin the running time • Solution: Use a DELETED node.

Rehashing • If we do not know the size of the elements in advance, we use a technique similar to the one used in vectors. Once the load factor reaches some predefined threshold, rehash the data into a larger hashtable.

Example • Given a set S of unique integers and a number z, find such that x+y = z – An efficient worst case algorithm – An efficient average case algorithm

An efficient worst case algorithm • 1. Sort all elements in S. • 2. For every x in S we search for z-x (y) in S using binary search – Total of O(nlogn)

An efficient average case algorithm • 1. We use a hash table where m is of order n for all we execute insert(x) • 2. For all Total we execute search(z-x) - average case - worst case

Example • Given a set S of sortable items, we are asked if all items in S are unique. • 1. Sort the elements of S. • 2. Iterate on the elements of S searching for subsequent equal values. • Execution time

Example • 1. Use a hash table were m is of order n. for all we execute insert(x). We modify the insert operation to signal if x already exists in the table. (every insert includes a search operation) • Execution time average case -

Java hashcode • Each java object has a method public int hashcode, which is defined in class Object, and is supported for the purposes of hashtables and hashmaps. • The default implementation returns a unique number that is based on the memory location of the object. • If two objects are equal they must have the same hashcode

Java hashcode • It is not required that distinct objects will have distinct hashcodes, but it will improve the performance of the hashtables. • Can the hashcode of an object change throughout it’s life cycle?