Hashing Sets Dictionaries Code Cleaning Expandable Array Stacks

Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Hashing so far To store 250 IP addresses in table: • Pick prime just bigger than 250 (n = 257) • Pick a 1, …, a 4 mod 257 (once and for all) • To hash x = (x 1, …, x 4): – Compute u = a 1 x 1 + … + a 4 x 4 mod 257 – Store x in a bucket at my. Array[u]

Generalization Old: To store 250 IP addresses in table New: store n 1 items, each between 0 and N

Generalization To store n 1 items between 0 and N • Pick prime n just bigger than n 1 • Let k = round_up(logn N) – Each “item” can be written as a k-digit number, base n • Pick a 1, …, ak mod n (once and for all) • To hash x = (x 1, …, xk): – Compute u = a 1 x 1 + … + akxk mod n – Store x in a bucket at my. Array[u]

Example • Store 8 items, each represented by 16 bits (i. e. , between 0 and 216 – 1 = 65535) • Solution: pick p = 11. • Log 11 65535 = 4. 625…, so we pick k = 5 • Pick 5 numbers a 1, …, a 5, mod 11: 3, 10, 0, 5, 2

Example (cont. ) • Multipliers: 3, 10, 0, 5, 2 • Typical “key”: 31905. • Convert to base 11: – Mod(31905, 11) = 5 – Div(31905, 11) = 2900 – Mod(2900, 11) = 7 – Div (2900, 11) = 263 … – 3190511 = 21 A 75 [“A” means “ 10”] • Hash = 3*2 + 10*1 + 0*A + 5*7 + 2*6 mod 11 = 63 mod 11 = 7.

In practice • Usually items aren’t given as integers between 0 and some large number N • Doing arithmetic (like “finding the digits”) for big numbers (larger than language can represent) is a pain algorithmically • Frequently have an “identifier” that’s a few bytes long, often encoded as a string of characters

Practice, cont’d • Assume objects have k-byte identifiers x • Compute u = a 1 x 1 + … + akxk mod n • Put (x, object) into hashbucket u • This works as long as n > 256 = byte size • Otherwise assumption of unif. distributed hash indexes is wrong

The SET Abstract Data Type • create(n): creates a new empty set structure, initially empty but capable of holding up to n elements. • empty(S): checks whether the set S is empty. • size (S): returns the number of elements in S. • element_of (x, S): checks whether the value x is in the set S. • enumerate (S): yields the elements of S in some arbitrary order. • add (S, x): adds the element x to S, if it is not there already. • delete (S, x): removes the element x from S, if it is there.

Implementing sets • Can use hashtable: – “create”, “empty”, and “size” are trivial – “enumerate”: take all elements in all buckets – “add” is just “insert”; “delete” is “delete” – is_element is just “find”

DICTIONARY ADT • Create, empty, size as in SET • Still to do: – Insert(key, value) – Find(key) • Sometimes called “store” and “fetch” • A dictionary is sometimes called a “map” – “key” is ‘mapped to’ “value” • Closely related to a “database” • May allow several values for one key – Find(key) returns a list of values in this case

Implementing a dictionary • Create(n) – Build an array of prime size a little more than n, each entry an empty list – Pick k numbers, mod n, to handle keys of length k

• Insert(key, value) – Let u = (a 1 key 1 + … + ak keyk) mod n – Insert (key, value) into array[u] • Find(key) – Let u = (a 1 key 1 + … + ak keyk) mod n – Search for (key, *) in array[u] – If you find (key, val), return val – Else return None • (Modify as appropriate to return list of vals)

Summary • We can now assume that we can create a SET or a DICT with O(n 1) insertion and lookup times whenever we need one • After this week’s HW, you can further assume that we don’t need to know the size of the SET or the DICT in advance

Example Application: JUMBLE!

JUMBLE • Input: list of all 5 -letter words in English • Each word represented as an array of five characters • Output: all words for which no other permutation is a word

Solution • Start with an empty dictionary • Foreach word w – Sort letters alphabetically to get wnew – D. insert(wnew, w) • Foreach word w – Sort alphabetically again to get wnew • D(wnew) contains anything except w – Skip w • Else output w

Clean Your Code • Errors per line ~ constant – Fewer errors overall! • Easier to grade – More likely to get credit • Cleaner code = cleaner thinking – Better understanding of material

LCA(u, v) lca = null udepth = T. depth(u) vdepth = T. depth(v) if (T. isroot(u) = true) or (T. isroot(v) = true) then lca = T. root while (lca = null) do if (u = v) then lca = u else if udepth > vdepth then u = T. parent(u) udepth = udepth – 1 else if vdepth > udepth v = T. parent(v) vdepth = vdepth – 1 else u = T. parent(u) v = T. parent(v) return lca

LCA(u, v, T) lca = null udepth = T. depth(u) vdepth = T. depth(v) if (T. isroot(u) = true) or (T. isroot(v) = true) then lca = T. root while (lca = null) do if (u = v) then lca = u else Needlessly complex if udepth > vdepth then u = T. parent(u) udepth = udepth – 1 else if vdepth > udepth v = T. parent(v) vdepth = vdepth – 1 else u = T. parent(u) v = T. parent(v) return lca

LCA(u, v, T) lca = null udepth = T. depth(u) Now irrelevant vdepth = T. depth(v) if (T. isroot(u) = true) or (T. isroot(v) = true) then lca = T. root while (lca = null) do if (u = v) then lca = u else if T. depth(u) > T. depth(v) then u = T. parent(u) else if T. depth(v) > T. depth(u) v = T. parent(v) else u = T. parent(u) v = T. parent(v) return lca

LCA(u, v, T) lca = null if (T. isroot(u) = true) or (T. isroot(v) = true) then lca = T. root while (lca = null) do if (u = v) then lca = u else if T. depth(u) > T. depth(v) then u = T. parent(u) else if T. depth(v) > T. depth(u) v = T. parent(v) else u = T. parent(u) v = T. parent(v) return lca

LCA(u, v, T) lca = null Redundant if (T. isroot(u) = true) or (T. isroot(v) = true) then lca = T. root while (lca = null) do if (u = v) then lca = u else if T. depth(u) > T. depth(v) then u = T. parent(u) else if T. depth(v) > T. depth(u) v = T. parent(v) else u = T. parent(u) v = T. parent(v) return lca

LCA(u, v, T) lca = null if T. isroot(u) or T. isroot(v) lca = T. root while (lca = null) do if (u = v) then lca = u else if T. depth(u) > T. depth(v) then u = T. parent(u) else if T. depth(v) > T. depth(u) v = T. parent(v) else u = T. parent(u) v = T. parent(v) return lca then

LCA(u, v, T) lca = null if T. isroot(u) or T. isroot(v) then it’s the answer; return it! lca = T. root while (lca = null) do if (u = v) then lca = u else if T. depth(u) > T. depth(v) then u = T. parent(u) else if T. depth(v) > T. depth(u) v = T. parent(v) else u = T. parent(u) v = T. parent(v) return lca

LCA(u, v, T) lca = null if T. isroot(u) or T. isroot(v) then lca = T. root return lca while (lca = null) do if (u = v) then lca = u else if T. depth(u) > T. depth(v) then u = T. parent(u) else if T. depth(v) > T. depth(u) v = T. parent(v) else u = T. parent(u) v = T. parent(v) return lca

LCA(u, v, T) lca = null if T. isroot(u) or T. isroot(v) then lca = T. root return lca Condition is irrelevant while (lca = null) do if (u = v) then lca = u return lca else if T. depth(u) > T. depth(v) then u = T. parent(u) else if T. depth(v) > T. depth(u) v = T. parent(v) else u = T. parent(u) v = T. parent(v) return lca

LCA(u, v, T) lca = null if T. isroot(u) or T. isroot(v) then lca = T. root return lca repeat if (u = v) then lca is no longer used! lca = u return lca else if T. depth(u) > T. depth(v) then u = T. parent(u) else if T. depth(v) > T. depth(u) v = T. parent(v) else u = T. parent(u) v = T. parent(v)

LCA(u, v, T) if T. isroot(u) or T. isroot(v) then return T. root repeat if (u = v) then return u else if T. depth(u) > T. depth(v) then u = T. parent(u) else if T. depth(v) > T. depth(u) v = T. parent(v) else u = T. parent(u) v = T. parent(v)

LCA(u, v, T) while T. depth(u) > T. depth(v) u = T. parent(u) while T. depth(v) > T. depth(u) v = T. parent(v) if T. isroot(u) or T. isroot(v) then return T. root repeat if (u = v) then return u else u = T. parent(u) v = T. parent(v)

LCA(u, v, T) while T. depth(u) > T. depth(v) u = T. parent(u) while T. depth(v) > T. depth(u) v = T. parent(v) if T. isroot(u) or T. isroot(v) or (u = v) then return u repeat [OOPS!] else u = T. parent(u) v = T. parent(v)

LCA(u, v, T) while T. depth(u) > T. depth(v) u = T. parent(u) while T. depth(v) > T. depth(u) v = T. parent(v) if T. isroot(u) or T. isroot(v) or (u = v) then return u else return LCA(T. parent(u), T. parent(v), T)

LCA(u, v, T) while T. depth(u) > T. depth(v) u = T. parent(u) while T. depth(v) > T. depth(u) Not needed v = T. parent(v) if T. isroot(u) or (u = v) then return u else return LCA(T. parent(u), T. parent(v), T)

LCA(u, v, T) while T. depth(u) > T. depth(v) Called during u = T. parent(u) recursion, but no while T. depth(v) > T. depth(u) effect v = T. parent(v) if (u = v) then return u else return LCA(T. parent(u), T. parent(v), T)

LCA(u, v, T) while T. depth(u) > T. depth(v) u = T. parent(u) while T. depth(v) > T. depth(u) v = T. parent(v) return LCAsimple(T. parent(u), T. parent(v), T) LCAsimple(u, v, T) # LCA for case where u and v have same height if (u = v) return u else return LCAsimple(T. parent(u), T. parent(v), T)

DONE!

STACK • Stack operations: – Push, pop, size, is. Empty() • (Partial) Implementation: – Array-based stack

Array. Stack INIT: data = array[20] Count = 0; // next empty space ------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

Array. Stack pop(): if count == 0 ERROR(“Can’t pop from empty Stack”) else count--; return data[count+1];

Array. Stack size(): return count is. Empty() return count == 0

Analysis

Array. Stack INIT: data = array[20] Count = 0; // next empty space ------------------------------Push(obj o): if count < 20 O(n 1) data[count] = o count++ else ERROR(“Overfull Stack”)

Array. Stack pop(): if count == 0 ERROR(“Can’t pop from empty Stack”) else count--; return data[count+1]; O(n 1)

Array. Stack size(): return count is. Empty() return count == 0 O(n 1)

Summary • Fast but not very useful

Expandable. Array. Stack INIT: data = array[20] Count = 0; // next empty space Capacity = 20

Push(obj o): if count < capacity data[count] = o count++ else d 2 = new Array[capacity+1] for j = 0 to capacity d 2[j] = data[j] capacity = capacity + 1 data = d 2 push(o)

Expandable Array Stack • All other operations remain the same

Analysis • In the worst case, the time taken is O(n n) • If we insert items 21, 22, …, 20+k, we’ll have done k operations, with total work 21+22+…+ (20+k) = (20+1) + (20+2) + …(20+k) = 20 k + (1+2+…+k) = 20 k + k(k+1)/2 = O(k k^2) • So average time is O(k k) as well!

Better: avoid frequent expansion • Instead of adding a little space, add a lot! • Double array size when it gets full

Doubling. Array. Stack: Push(obj o): if count < capacity data[count] = o count++ else d 2 = new Array[2*capacity] for j = 0 to capacity d 2[j] = data[j] capacity = 2*capacity data = d 2 push(o)

Doubling Array Stack • All other operations remain the same

Analysis Push(obj o): if count < capacity data[count] = o count++ else d 2 = new Array[2*capacity] for j = 0 to capacity d 2[j] = data[j] capacity = 2*capacity data = d 2 push(o) O(n 1) O(n n)

Analysis • In the worst case, the time taken is O(n n) • But over the course of many operations, average time per operation is O(n 1)

“Total Work Analysis” • • If we have an array with n elements …and do n operations …then total work is no more than 4 n. Work per operation, on average, is 4.

Alternative view • “Amortized” analysis: – For each operation that takes one unit of time • Place an extra unit of time “in the bank” – By the time an expensive operation arrives • Use your savings to pay for it • Alternative view: – When you do an expensive operation • Pay one unit now • Pay an extra unit for each of the next n operations

Language • For hashing: “the ‘find’ operation runs in expected O(n 1) time” • For doubling array stacks: “the ‘push’ operation runs in O(n 1) amortized time, with O(n n) worst-case time. ”

Pixel boundaries (if time)