Hash Tables DictionaryMap ADT Direct Addressing Hash Tables
Hash Tables Dictionary/Map ADT Direct Addressing Hash Tables Random Algorithms Universal Hash Functions Key to Integer Separate Chaining Probe Sequence Put, Get, Del, Iterators Running Time Simpler Schemes Lecture 7 Jeff Edmonds York University COSC 20111
Random Balls in Bins Throw m/2 balls (keys) randomly into m bins (array cells) The balls get spread out reasonably well. - Exp( # a balls a ball shares a bin with ) = O(1) - O(1) bins contain O(logn) balls. 2
Input key, value k 1, v 1 k 2, v 2 k 3, v 3 k 4, v 4 Dictionary/Map ADT Problem: Store value/data associated with keys. Examples: • key = word, value = definition • key = social insurance number value = person’s data 3
Dictionary/Map ADT • Map ADT methods: – put(k, v): insert entry (k, v) into the map M. • If there is a previous value associated with k return it. • (Multi-Map allows multiple values with same key) – get(k): returns value v associated with key k. (else null) – remove(k): remove key k and its associated value. – size(), is. Empty() – Iterator: • keys(): over the keys k in M • values(): over the values v in M • entries(): over the entries k, v in M 4
Dictionary/Map ADT Operation Output M is. Empty() put(5, A) put(7, B) put(2, C) put(8, D) put(2, E) get(7) get(4) get(2) size() remove(5) remove(2) get(2) is. Empty() true null C B null E 4 A E null false Ø (5, A), (7, B), (2, C), (8, D) (5, A), (7, B), (2, E), (8, D) (5, A), (7, B), (2, E), (8, D) (7, B), (8, D) 5
Input Dictionary/Map ADT key, value Array k 1, v 1 k 2, v 2 k 3, v 3 k 4, v 4 Problem: Store value/data associated with keys. 0 1 2 3 4 k 5, v 5 5 6 7 … Implementations: Unordered Array Insert Search O(1) O(n) 6
Input Dictionary/Map ADT key, value Array 2, v 3 4, v 4 7, v 1 9, v 2 Problem: Store value/data associated with keys. 0 1 2 6, v 5 3 4 5 6 7 … Implementations: Unordered Array Ordered Array Insert O(1) O(n) Search O(n) O(logn) 7
6, v 5 entries Implementations: Problem: Store value/data associated with keys. trailer nodes/positions 2, v 3 4, v 4 7, v 1 9, v 2 Dictionary/Map ADT header Input key, value Insert Search Unordered Array O(1) O(n) Ordered Array O(n) O(logn) Ordered Linked List O(n) Inserting is O(1) if you have the spot. but O(n) to find the spot. 8
Input key, value 2, v 3 4, v 4 7, v 1 9, v 2 Dictionary/Map ADT Problem: Store value/data associated with keys. 38 25 17 4 21 51 31 42 63 28 35 40 49 Implementations: Unordered Array Ordered Linked List Binary Search Tree 55 71 Insert O(1) O(n) O(logn) Search O(n) O(logn) 9
Input key, value 5, v 1 9, v 2 2, v 3 7, v 4 Dictionary/Map ADT Problem: Store value/data associated with keys. Hash Tables are very fast, but keys have no order. Implementations: Insert Unordered Array O(1) Ordered Array O(n) Ordered Linked List O(n) Binary Search Tree O(logn) Hash Tables Avg: O(1) Search O(n) O(logn) O(1) Next O(n) O(1) O(n) (Avg) 10
Input key, value 5, v 1 9, v 2 2, v 3 7, v 4 4, v 5 7, ? Direct Addressing Array 0 1 Suppose your array was REALLY big. Where would you store 5, v 1 ? 2 3 4 5 Called Direct Addressing 6 7 8 9 Direct Addressing Insert O(1) … Implementations: Search O(1) Next O(1) 11
Input key, value 5, v 1 9, v 2 2, v 3 7, v 4 Direct Addressing Universe of Keys 0 1 2 3 4 5 6 7 8 9 Direct Addressing 0 1 2 2, v 3 3 4 5 4, v 5 5, v 1 6 7 7, v 4 8 9 The Mapping from key to Array Cell is 1 -1 Called Direct Addressing 9, v 2 Insert O(1) … … Implementations: Array Consider the universe of all possible keys. Search O(1) Next O(1) 12
Input key, value 5, v 1 9, v 2 2, v 3 7, v 4 If most keys are used, then is a fine data structure. Direct Addressing Universe of Keys 0 1 k 3= 2 3 k 5= 4 k 1= 5 6 k 4= 7 8 k 2= 9 Direct Addressing 0 1 2 2, v 3 3 4 5 4, v 5 5, v 1 6 7 7, v 4 8 9 The Mapping from key to Array Cell is 1 -1 Called Direct Addressing 9, v 2 Insert O(1) … … Implementations: Array Consider the universe of all possible keys. Search O(1) Next O(1) 13
Input key, value 5, v 1 9, v 2 2, v 3 7, v 4 Direct Addressing Universe of Keys The keys used are those of your customers. Universe of keys is likely huge. (eg social insurance numbers) 0 1 k 3= 2 3 k 5= 4 k 1= 5 6 k 4= 7 8 k 2= 9 Array 0 1 2 2, v 3 3 4 5 4, v 5 5, v 1 6 7 7, v 4 8 9 Consider the universe of all possible keys. The Mapping from key to Array Cell is 1 -1 Called Direct Addressing 9, v 2 O(1) … … Insert Search Next Memory O(1) ∞ How far is the next key? 14
Input key, value Hash Tables Consider an array # items stored. Universe of Keys The keys used are those of your customers. Universe of keys is likely huge. (eg social insurance numbers) 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 The Mapping from key to Array Cell is many to one Called Hash Function Hash(key) = i 9 … 15
Input key, value 5, v 1 9, v 2 2, v 3 7, v 4 5, ? 4, v 5 Hash Tables Consider an array # items stored. Universe of Keys Collisions are a problem. 0 1 2 3 4 5 6 7 8 9 … Implementations: Hash Function 0 1 2 3 4 5 6 7 8 The Mapping from key to Array Cell is many to one Called Hash Function Hash(key) = i 9 Insert O(1) Search O(1) Next O(n) 16
Input key, value 5, v 1 9, v 2 2, v 3 7, v 4 5, ? 4, v 5 Hash Tables Consider an array # items stored. Universe of Keys Collisions are a problem. 0 1 2 3 4 5 6 7 8 9 … Implementations: Hash Function 0 1 2, v 3 7, v 4 2 3 4 5 9, v 2 4, v 5 6 7 8 5, v 1 The Mapping from key to Array Cell is many to one Called Hash Function Hash(key) = i 9 • Choose Hash to minimize collisions. • Deal with collisions. 17
Input key, value 5, v 1 9, v 2 2, v 3 7, v 4 5, ? 4, v 5 Hash Tables Universe of Keys 0 1 2 3 4 5 6 7 8 9 … Implementations: Hash Function 0 1 2, v 3 7, v 4 2 The algorithm chooses the Hash Function mapping from keys to Array Cell. (Does not depend on input) 3 4 5 6 The input I specifies 9, v 2 4, v 5 which keys are used. 7 8 9 5, v 1 Algorithm wants few collisions Worst case input wants lots collisions • Choose Hash to minimize collisions. 18
Input key, value 287, v 1 9, v 2 394, v 3 482, v 4 583, ? 4, v 5 Hash Tables Universe of Keys 0 1 2 3 4 5 6 7 8 9 … Implementations: Hash Function 0 1 2 3 4 4, v 2 5 1 5 482, v 4 9, v 287, v 394, v 3 583, ? 6 7 The universe of keys is huge compared to the array. Hence, many keys get mapped to each array cell. The worst case input I uses keys that all go to the same array cell! 8 9 • Choose Hash to minimize collisions. 19
Input key, value 293 183, v 1 948 847, v 2 039 988, v 3 948 475, v 4 948 847, ? 304 382, v 5 Hash Tables Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 304 382, v 5 If the keys are social insurance number, Hash(key) = i could be the last digit. 3 293 183, v 1 4 5 948 475, v 4 6 7 948 847, v 2 8 039 988, v 3 9 The “random” input I would have all clients have the different last digits. … So few collisions But what is a “random” input I? 20
Input key, value 287 005, v 1 923 005, v 2 394 005, v 3 482 005, v 4 287 005, ? 193 005, v 5 Hash Tables If the keys are social insurance number, Hash(key) = i could be the last digits. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 The worst case input I 394, v 583, ? would have all clients have the same last three digits. 4, v 2 5 1 5 482, v 4 9, v 287, v 6 7 8 9 3 … So lots of collisions In practice, do you get the worst case input? 21
Input key, value 287 005, v 1 923 005, v 2 394 005, v 3 482 005, v 4 287 005, ? 193 005, v 5 Hash Tables To confuse the “worst case” input, the mapping Hash(key) = i I choose is ? ? Random! Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 The worst case input I 394, v 583, ? would have all clients have the same last three digits. 4, v 2 5 1 5 482, v 4 9, v 287, v 6 7 8 3 9 … 22
Probabilistic Algorithms 23
Probabilistic Algorithms Problem P is computable if $A, "I, A(I)=P(I) & Time(A, I) ≤ T I have an algorithm A that I claim works. Oh yeah, I have a worst case input I for which it does not. Actually my algorithm always gives the right answer and is fast. 24
Probabilistic Algorithms Problem P is computable by a random algorithm if "R, AR(I)=P(I) $A, "I, Remember Quick Sort I have a random algorithm A that I claim works. I know the algorithm A, but not its random coin flips R. I do my best to give you a worst case input I. The random coin flips R are independent of the input. And for EVERY input I, I want the correct answer. 25
Probabilistic Algorithms Problem P is computable by a random algorithm if "R, AR(I)=P(I) $A, "I, Remember Expected. R Time(AR, I) ≤ T Quick Sort I have a random algorithm A that I claim works. I know the algorithm A, but not its random coin flips R. I do my best to give you a worst case input I. The random coin flips R are independent of the input. And for EVERY input I, the expected running time (over choice of R) is great. There are worst case coin flips but not worst case inputs. 26
Probabilistic Algorithms Problem P is computable by a random algorithm if -|I| $A, "I, A (I)=P(I) ≤ 2 Remember Pr. R R Expected. R Time(AR, I) Quick Sort I have a random algorithm A that I claim works. I know the algorithm A, but not its random coin flips R. I do my best to give you a worst case input I. Some random algorithm might give the wrong answer on exponentially few random coin flips 27
Input key, value 287 005, v 1 923 005, v 2 394 005, v 3 482 005, v 4 287 005, ? 193 005, v 5 Random Hash Functions Fix the worst case input I Universe of Keys 0 1 287 005 2 3 482 005 4 Choose a random mapping Hash(key) = i We don’t expect there to be a lot of collisions. 5 193 005 6 7 394 005 923 005 8 9 (Actually, the random Hash function likely is chosen and fixed before the input comes, but the key is that the worst case input does not “know” the hash function. ) 28
Input key, value 287 005, v 1 923 005, v 2 394 005, v 3 482 005, v 4 287 005, ? 193 005, v 5 Random Hash Functions Throw m/2 balls (keys) randomly into m bins (array cells) Universe of Keys 0 1 287 005 2 3 482 005 4 5 193 005 6 7 8 394 005 9 spread out reasonably well. 923 The 005 balls get - Exp( # a balls a ball shares a bin with ) = O(1) - O(1) bins contain O(logn) balls. 29
Universal Hash Functions Choose a random mapping Hash(key) = i Universe of Keys 0 1 2 3 4 5 6 7 8 9 We want Hash to be computed in O(1) time. 0 1 Theory people use 2 3 Hash(key) = (a key mod p) mod N 4 N = size of array p is a prime > |U| a is randomly chosen [1. . p-1] n is the number of data items. 5 6 7 8 9 … a adds just enough randomness. The integers mod p The mod N ensures form a finite field the result indexes a similar to the reals. cell in the array. 30
Universal Hash Functions Choose a random mapping Hash(key) = i Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 We want Hash to be computed in O(1) time. Theory people use 3 Hash(key) = (a key mod p) mod N 4 N = size of array p is a prime > |U| a is randomly chosen [1. . p-1] n is the number of data items. 5 6 7 8 9 Pairwise Independence " k 1 & k 2 Pra( Hasha(k 1)=Hasha(k 2) ) = 1/N … Proof: Fix distinct k 1&k 2 U. Because p > |U|, k 2 -k 1 mod p 0. Because p is prime, every nonzero element has an inverse, eg 2 3 mod 5=1. Let e=(k 2 -k 1)-1. Let D = a (k 2 -k 1) mod p, a = D e mod p, and d = D mod N. k 1&k 2 collide iff d=0 iff D = j. N for j [0. . p/N] iff a = j. N e mod p. The probability a has one of these p/N values is 1/p p/N = 1/N. 31
Universal Hash Functions Choose a random mapping Hash(key) = i Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 We want Hash to be computed in O(1) time. Theory people use 3 Hash(key) = (a key mod p) mod N 4 N = size of array p is a prime > |U| a is randomly chosen [1. . p-1] n is the number of data items. 5 6 7 8 9 Pairwise Independence " k 1 & k 2 Pra( Hasha(k 1)=Hasha(k 2) ) = 1/N … Insert key k. Exp( #other keys in its cell ) = Exp( k 1, k collision ) = k , k Exp(collision) 1 = n 1/N = O(1). 32
Universal Hash Functions Choose a random mapping Hash(key) = i Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 We want Hash to be computed in O(1) time. Theory people use 3 Hash(key) = (a key mod p) mod N 4 N = size of array p is a prime > |U| a is randomly chosen [1. . p-1] n is the number of data items. 5 6 7 8 9 Pairwise Independence " k 1 & k 2 Pra( Hasha(k 1)=Hasha(k 2) ) = 1/N … Not much more independence. Knowing that Hasha(k 1)=Hasha(k 2) decreases the range of a from p to p/N values. Doing this logp/log. N times, likely determines a, and hence all further collisions. 33
Universal Hash Functions Choose a random mapping Hash(key) = i Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 We want Hash to be computed in O(1) time. Theory people use 3 Hash(key) = (a key mod p) mod N 4 N = size of array p is a prime > |U| a is randomly chosen [1. . p-1] n is the number of data items. 5 6 7 8 9 … This is usually written a key+b. The b adds randomness to which cells get hit, but does not help with collisions. 34
Key to Integer Universe of Keys abandon abbreviation ability able about above abroad absence absent absolute Choose a random mapping Hash(key) = i If the key is a string a 0 a 1 … an-1. convert to an integer with k = a 0 + a 1 z + a 2 z 2 + … + an-1 zn-1 for fixed z (computed with ki = an-i-1 + zki-1) … 35
Key to Integer Universe of Keys Choose a random mapping Hash(key) = i If the key is an object in memory, you can use its address in memory as the key. … 36
Handling Collisions Input key, value 394, v 1 482, v 2 583, v 3 Handling Collisions When different data items are mapped to the same cell Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 394, v 1 6 583, v 3 7 482, v 2 8 … 9 10 37
Separate Chaining Input key, value 394, v 1 482, v 2 583, v 3 Separate Chaining Each cell uses external memory to store all the data items hitting that cell. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 394, v 1 6 583, v 3 7 482, v 2 8 … 9 10 394, v 1 583, v 3 482, v 2 Simple but requires additional memory 38
A Sequence of Probes Input key, value 394, v 1 482, v 2 583, v 3 Open addressing The colliding item is placed in a different cell of the table Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 394, v 1 6 583, v 3 7 482, v 2 8 … 9 10 39
A Sequence of Probes Input key, value 103, v 6 Cells chosen by a sequence of probes. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 583, v 3 1 put(key k, value v) i 1 = (a k mod p) mod N 5 2 3 290, v 4 4 5 394, v 1 6 7 482, v 2 8 903, v 5 … 9 10 Theory people use i 1 = Hash(key) = (a key mod p) mod N N = size of array 11 1009 p is a prime > |U| a [1, p-1] is randomly chosen 832 Next. Prime(1000) = 1009 a key = 832 103 = 85696 mod 1009 = 940 mod 11 = 5 = i 1 40
A Sequence of Probes Input key, value 103, v 6 6 103, v Cells chosen by a sequence of probes. Universe of Keys 0 1 2 3 4 5 6 7 8 9 put(key k, value v) 0 583, v 3 1 i 1 = (a k mod p) mod N 5 2 3 290, v 4 4 5 394, v 1 6 7 482, v 2 8 903, v 5 1 This was our first in the sequence of probes. … 9 10 41
A Sequence of Probes Input key, value 103, v 6 Double Hash to get sequence distance d. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 583, v 3 1 2 put(key k, value v) i 1 = (a k mod p) mod N d = (b k mod q) + 1 5 3 3 290, v 4 4 5 394, v 1 6 7 482, v 2 8 903, v 5 … 9 10 A common secondary hash function. 1 d = Hash 2(key) = (b key mod q) + 1 7 q is a prime < N +1 to ensure d 0 b is chosen randomly b=1 d = 103 mod 7 + 1= 3 42
A Sequence of Probes Input key, value 103, v 6 Double Hash to get sequence distance d. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 583, v 3 1 put(key k, value v) 3 5 i 1 = (a k mod p) mod N 3 d = (b k mod q) + 1 for j = 1. . N i = i 1 + (j-1) d mod N 2 3 290, v 4 4 4 5 394, v 1 6 7 482, v 2 8 903, v 5 1 5 d=3 2 … 9 10 If N is prime, this sequence will reach cell. d=3 3 43
A Sequence of Probes Input key, value 103, v 66 103, v Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 583, v 3 1 2 3 290, v 4 4 5 394, v 1 6 7 482, v 2 8 903, v 5 … 9 10 3 Stop this sequence of probes when: Cell is empty or key already there value put(key k, value v) 5 i 1 = (a k mod p) mod N 4 3 d = (b k mod q) + 1 for j = 1. . N 1 i = i 1 + (j-1) d mod N 5 if ( cell(i)==empty ) cell(i) = k, v 2 return was not there if ( cellkey(i)==k ) vold = cellvalue(i) 3 cell(i) = k, v 44 return v
A Sequence of Probes Input key, value 103, v 6 115, v 77 115, v A different key gets different “random” hash values. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 583, v 3 1 2 3 290, v 4 1 4 5 394, v 1 2 6 103, v 6 7 482, v 2 3 8 903, v 5 … 9 10 4 value put(key k, value v) 3 i 1 = (a k mod p) mod N 2 d = (b k mod q) + 1 for j = 1. . N i = i 1 + (j-1) d mod N if ( cell(i)==empty ) cell(i) = k, v return was not there if ( cellkey(i)==k ) vold = cellvalue(i) cell(i) = k, v 45 return v
A Sequence of Probes Input key, value 103, v 6 115, v 7 103, ? Stop this sequence of probes when: Key is found Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 583, v 3 1 3 290, v 4 4 4 5 394, v 1 6 103, v 6 7 482, v 2 8 903, v 5 1 5 2 value get(key k) 5 i 1 = (a k mod p) mod N 3 d = (b k mod q) + 1 for j = 1. . N i = i 1 + (j-1) d mod N if ( cellkey(i)==k ) return(cellvalue(i)) … 9 115, v 7 10 46
A Sequence of Probes Input key, value 103, v 6 115, v 7 103, ? 477, ? Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 583, v 3 1 2 2 3 290, v 4 4 5 394, v 1 6 103, v 6 7 5 3 1 7 482, v 2 8 903, v 5 … 9 115, v 7 10 X 6 4 Stop this sequence of probes when: Key is found or cell is empty value get(key k) 6 i 1 = (a k mod p) mod N 5 d = (b k mod q) + 1 for j = 1. . N i = i 1 + (j-1) d mod N if ( cellkey(i)==k ) return(cellvalue(i)) if ( cell(i)==empty ) return not there 47
A Sequence of Probes Input key, value 103, v 6 115, v 7 103, ? 477, ? Del 583 Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 583, v 3 1 2 3 290, v 4 4 5 394, v 1 6 103, v 6 7 482, v 2 8 903, v 5 … 9 115, v 7 10 1 value del(key k) 1 i 1 = (a k mod p) mod N 5 d = (b k mod q) + 1 for j = 1. . N i = i 1 + (j-1) d mod N if ( cellkey(i)==k ) vold = cellvalue(i) cell(i) = empty return(vold) 48
A Sequence of Probes Input key, value 103, v 6 115, v 7 103, ? 477, ? Del 583 103, ? Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 deleted X 3 290, v 4 4 4 5 394, v 1 6 103, v 6 7 482, v 2 8 903, v 5 … 9 115, v 7 10 1 5 2 Stop this sequence of probes when: Key is found or cell is empty value get(key k) 5 i 1 = (a k mod p) mod N 3 d = (b k mod q) + 1 for j = 1. . N i = i 1 + (j-1) d mod N if ( cellkey(i)==k ) return(cellvalue(i)) if ( cell(i)==empty ) return not there Oops! It no longer finds 103. 49
A Sequence of Probes Input key, value 103, v 6 115, v 7 103, ? 477, ? Del 583 Universe of Keys 0 1 2 3 4 5 6 7 8 9 v 3 deleted 0 583, 1 2 3 290, v 4 4 5 394, v 1 6 103, v 6 7 482, v 2 8 903, v 5 … 9 115, v 7 10 1 value del(key k) 1 i 1 = (a k mod p) mod N 5 d = (b k mod q) + 1 for j = 1. . N i = i 1 + (j-1) d mod N if ( cellkey(i)==k ) vold = cellvalue(i) cell(i) = deleted return(vold) 50
A Sequence of Probes Input key, value 103, v 6 115, v 7 103, ? 477, ? Del 583 103, ? Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 deleted 3 290, v 4 4 4 5 394, v 1 6 103, v 6 7 482, v 2 8 903, v 5 … 9 115, v 7 10 1 5 2 Stop this sequence of probes when: Key is found or cell is empty value get(key k) 5 i 1 = (a k mod p) mod N 3 d = (b k mod q) + 1 for j = 1. . N i = i 1 + (j-1) d mod N if ( cellkey(i)==k ) return(cellvalue(i)) if ( cell(i)==empty ) return not there Excellent! 51
A Sequence of Probes Input key, value 103, v 6 115, v 7 103, ? 477, ? Del 583 103, ? 115, v 88 8 115, v Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 deleted 2 3 290, v 4 4 5 394, v 1 6 103, v 6 7 482, v 2 8 903, v 5 … 9 115, v 7 10 2 Stop this sequence of probes when: Cell is empty ordeleted or key already there value put(key k, value v) 7 i 1 = (a k mod p) mod N 4 d = (b k mod q) + 1 for j = 1. . N 3 i = i 1 + (j-1) d mod N if ( cell(i)==empty or deleted) ) 1 cell(i) = k, v return was not there if ( cellkey(i)==k ) vold = cellvalue(i) cell(i) = k, v 52 return v
Iterator Input key, value The time to iterate over all items in the data structure Universe of Keys 0 1 2 3 4 5 6 7 8 9 (in order appearing in array) 0 1 = the size of the array N = O(# of data time) 2 (because rehashing keeps the load factor O(1)) 3 4 5 394, v 1 6 193, v 3 7 472, v 4 8 938, v 2 … 9 873, v 5 10 093, v 6 53
Running Time Input key, value The load factor a = n/N < 0. 9 # data items / # of array cells. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 394, v 1 6 193, v 3 N = 11 n = 6 a = 6/11 7 472, v 4 8 938, v 2 … 9 873, v 5 10 093, v 6 54
Running Time Input key, value The load factor a = n/N < 0. 9 # data items / # of array cells. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 394, v 1 6 193, v 3 7 472, v 4 8 938, v 2 … 9 873, v 5 10 093, v 6 = O(1) • # of probes • Probability j-1 probes collide • Probability jth probe collides 55
Running Time Input key, value The load factor a = n/N < 0. 9 # data items / # of array cells. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 394, v 1 6 193, v 3 7 472, v 4 8 938, v 2 … 9 873, v 5 10 093, v 6 = O(1) Actually these calculations require Hash to be purely random. If only the a is random then we only have pair-wise independence. If p >> n, then Jeff think the result may still be true, but he is not sure. 56
Running Time Input key, value Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 394, v 1 6 193, v 3 7 472, v 4 8 938, v 2 When the load factor gets bigger than some threshold, rehash all items into an array that is double the size. Total cost of doubling = 1 + 2 + 4 + 8 + 16 + … + n = 2 n-1 amortized time = Total. Time(n)/n = (2 n-1)/n = O(1). … 9 873, v 5 10 093, v 6 57
Simpler Hash Functions Input key, value 103, v 6 115, v 7 103, ? 477, ? Del 583 103, ? 115, v 8 There are simpler hash functions. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 115, v 8 1 2 3 290, v 4 4 5 394, v 1 6 103, v 6 7 482, v 2 8 903, v 5 … 9 115, v 7 10 value put(key k, value v) i 1 = (a k mod p) mod N d = (b k mod q) + 1 for j = 1. . N i = i 1 + (j-1) d mod N i 1 = key mod N for table size N prime, could be perfectly fine. 58
Simpler Hash Functions Input key, value 103, v 6 115, v 7 103, ? 477, ? Del 583 103, ? 115, v 8 There are simpler hash functions. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 115, v 8 1 2 3 290, v 4 4 5 394, v 1 6 103, v 6 7 482, v 2 8 903, v 5 … 9 115, v 7 10 value put(key k, value v) i 1 = (a k mod p) mod N 1 d = (b k mod q) + 1 for j = 1. . N 1 i = i 1 + (j-1) d mod N 2 3 d = 1 4 is called 5 Linear Probing 6 The items tend to clump. 59
Simpler Hash Functions Input key, value There are simpler hash functions. Universe of Keys 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 394, v 1 6 193, v 3 7 472, v 4 8 938, v 2 … 9 873, v 5 10 093, v 6 value put(key k, value v) i 1 = (a k mod p) mod N 1 d = (b k mod q) + 1 for j = 1. . N 1 i = i 1 + (j-1) d mod N 1 1 d = 1 1 is called Linear Probing The items tend to clump. 60
End 61
- Slides: 61