Hash Discrete Mathematics and Its Applications Baojian Hua

Hash Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc. edu. cn

Searching n A dictionary-like data structure n contains a collection of tuple data: n n n <k 1, v 1>, <k 2, v 2>, … keys are comparable and pair-wise distinct supports these operations: n n new () insert (dict, k, v) lookup (dict, k) delete (dict, k)

Examples Application Phone Book Bank Dictionary compiler www. google. c om … Purpose phone transaction lookup symbol search Key name visa word variable key words Value phone No. $$$ meaning type contents … … …

Summary So Far rep’ array lookup( O(n) ) O(lg n) O(n) binary search tree O(n) insert() O(n) O(n) delete( O(n) ) O(n) op’ sorted array linked list sorted linked list

What’s the Problem? n For every mapping (k, v)s n n n After we insert it into the dictionary dict, we don’t know it’s position! Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), … and then lookup (d, “zhang”); (“li”, 97) (“wang”, 99) (“zhang”, 100) …

Basic Plan n Start from the array-based approach n n Use an array A to hold elements (k, v)s For every key k: n n if we know its position (array index) i from k then lookup, insert and delete are simple: n (k, v) n A[i] done in constant time O(1) i …

Example n Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), …; and then lookup (d, “zhang”); (“li”, 97) Problem#1: How to calculate index from the given key? ? …

Example n Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), …; and then lookup (d, “zhang”); (“li”, 97) Problem#2: How long should array be? ? …

Basic Plan n n Save (k, v)s in an array, index i calculated from key k Hash function: a method for computing index from given keys (“li”, 97) hash (“li”) …

Hash Function n Given any key, compute an index n n Efficiently computable Ideal goals: for any key, the index is uniform n n n different keys to different indexes However, thorough research problem, : -( Next, we assume that the array is of infinite length, so the hash function has type: n n int hash (key k); To get some idea, next we perform a “case analysis” on how different key types affect “hash”

Hash Function On “int” // If the key of hash is of “int” type, the hash // function is trivial: int hash (int i) { return i; }

Hash Function On “char” // If the key of hash is of “char” type, the hash // function comes with type conversion: int hash (char c) { return c; }

Hash Function On “float” // Also type conversion: int hash (float f) { return (int)f; } // how to deal with 0. aaa, say 0. 5?

Hash Function On “string” // Example: “Bill. G”: // A trivial one, but not so good: int hash (char *s) { int i=0, sum=0; while (s[i]) { sum += s[i]; i++; } return sum; }

Hash Function On “Point” // Suppose we have a user-define type: struct Point 2 d { int x; int y; }; int hash (struct Point 2 d pt) { // ? ? ? }

From “int” Hash to Index n Recall the type: n n Problems with “int” return type n n n int hash (T data); At any time, the array is finite no negative index (say -10) Our goal: int i ==> [0, N-1] n Ok, that’s easy! It’s just: abs(i) % N n

Bug! n Note that “int”s range: -231~231 -1 n So abs(-231) = 231 Overflow! n The key step is to wipe the sign bit off int t = i & 0 x 7 fffffff; int hc = t % N; n In summary: hc = (i & 0 x 7 fffffff) % N; n

Collision n n Given two keys k 1 and k 2, we compute two hash codes hc 1, hc 2 [0, N-1] If k 1<>k 2, but h 1==h 2, then a collision occurs (k 1, v 1) (k 2, v 2) i …

Collision Resolution n Open Addressing Re-hash Chaining (Multi-map)

Chaining n For collision index i, we keep a separate linear list (chain) at index i (k 1, v 1) (k 2, v 2) i k 1 k 2 …

General Scheme k 8 k 1 k 43 k 2 k 5

Load Factor n load. Factor=num. Items/num. Buckets n default. Load. Factor: default value of the load factor k 8 k 1 k 43 k 2 k 5

“hash” ADT: interface #ifndef HASH_H #define HASH_H typedef void *poly; typedef poly key; typedef poly value; typedef struct hash. Struct *hash; hash void poly void new. Hash (); new. Hash 2 (double lf); insert (hash h, key k, value v); lookup (hash h, key k); delete (hash h, key k); #endif

Hash Implementation #include “hash. h” #define EXT_FACTOR 2 #define INIT_BUCKETS 16 struct hash. Struct { linked. List *buckets; int num. Buckets; int num. Items; double load. Factor; };

In Figure h buckets num. Buckets num. Items load. Factor k 8 k 1 k 43 k 2 k 5

“new. Hash ()” hash new. Hash () { hash h = (hash)malloc (sizeof (*h)); h->buckets = malloc (INIT_BUCKETS * sizeof (linked. List)); for (…) // init the array h->num. Buckets = INIT_BUCKETS; h->num. Items = 0; h->load. Factor = 0. 25; return h; }

“new. Hash 2 ()” hash new. Hash 2 (double lf) { hash h = (hash)malloc (sizeof (*h)); h->buckets=(linked. List *)malloc (INIT_BUCKETS * sizeof (linked. List)); for (…) // init the array h->num. Buckets = INIT_BUCKETS; h->num. Items = 0; h->load. Factor = lf; return h; }

“lookup (hash, key)” value lookup (hash h, key k, comp. Ty cmp) { int i = k->hash. Code (); // how to perform this? int hc = (i & 0 x 7 fffffff) % (h->num. Buckets); value t =linked. List. Search ((h->buckets)[hc], k); return t; }

Ex: lookup (ha, k 43) hc = (hash (k 43) & 0 x 7 fffffff) % 8; ha buckets // hc = 1 k 8 k 1 k 43 k 2 k 5

Ex: lookup (ha, k 43) hc = (hash (k 43) & 0 x 7 fffffff) % 8; ha buckets compare k 43 with k 8, // hc = 1 k 8 k 1 k 43 k 2 k 5

Ex: lookup (ha, k 43) hc = (hash (k 43) & 0 x 7 fffffff) % 8; ha buckets compare k 43 with k 43, found! // hc = 1 k 8 k 1 k 43 k 2 k 5

“insert” void insert (hash h, poly k, poly v) { if (1. 0*num. Items/num. Buckets >=default. Load. Factor) // buckets extension & items re-hash; int i = k->hash. Code (); // how to perform this? int hc = (i & 0 x 7 fffffff) % (h->num. Buckets); tuple t = new. Tuple (k, v); linked. List. Insert. Head ((h->buckets)[hc], t); return; }

Ex: insert (ha, k 13) hc = (hash (k 13) & 0 x 7 fffffff) % 8; ha buckets // suppose hc==4 k 8 k 1 k 43 k 2 k 5

Ex: insert (ha, k 13) hc = (hash (k 13) & 0 x 7 fffffff) % 8; ha buckets // suppose hc==4 k 8 k 13 k 43 k 1 k 2 k 5

Complexity rep’ array lookup( O(n) ) O(lg n) O(n) sorted linked list O(n) insert() O(n) O(1) delete( O(n) ) O(n) O(1) op’ sorted array linked list hash O(1)