Lecture 20 Hashing Amortized Analysis Quick Selection Goal

  • Slides: 17
Download presentation
Lecture 20 Hashing Amortized Analysis

Lecture 20 Hashing Amortized Analysis

Quick. Selection • Goal: Given an array of numbers Find the k-th smallest number.

Quick. Selection • Goal: Given an array of numbers Find the k-th smallest number. • Example: • a[] = {4, 2, 8, 6, 3, 1, 7, 5} k=3 • Output = 3

Recursion • Right Part Left Part Split cost

Recursion • Right Part Left Part Split cost

Motivation: Set and Map • Goal: An array whose index can be any object.

Motivation: Set and Map • Goal: An array whose index can be any object. • Example: Dictionary[“hash”] = “a dish of diced or chopped meat and often vegetables…” • Properties: 1. Efficient lookup: Hope lookup is O(1) 2. Space: space is within constant factor to a list. • This lecture: maintain a set of numbers from 0 to N-1. N is very large (think N = 232 or 264)

Naïve implementation of a set • Method 1: Maintain a linked list. • Problem:

Naïve implementation of a set • Method 1: Maintain a linked list. • Problem: Lookup takes O(n) time. • Method 2: Use a large array • a[i] = 1 if i is in the set • Problem: Needs huge amount of memory.

Hashing • Idea: for each number, assign a random location • Example: {3, 10,

Hashing • Idea: for each number, assign a random location • Example: {3, 10, 3424, 643523} • Store number i in a[f(i)] • f(i): hash function.

Collisions • Problem: want to add 123, f(123) = 4 = f(3424). • (This

Collisions • Problem: want to add 123, f(123) = 4 = f(3424). • (This will always happen because of pigeon hole principle) • Solution: 123 and 3424 will share this location. null 10 null 3 3424 123 null 643523

Fixed Hash Function • If the hash function is fixed, then it can be

Fixed Hash Function • If the hash function is fixed, then it can be very slow for some bad examples. • Example: We can try to find n numbers x 1, x 2, …, xn such that f(xi) = y for some fixed y (always possible by pigeon hole principle) • Then hash table degenerates into a linked list. • Solution: Use a family of random hash functions.

When do we “randomly select” the hash function? • Idea 1: Choose a new

When do we “randomly select” the hash function? • Idea 1: Choose a new hash function every time we make a query. • Does not work. We may store 123 at position 4 because f(123) = 4, but after we choose a new hash function, f’(123) may not be equal to 4. • Idea 2: Choose a random hash function when creating the hash table. • This makes sure we can access the numbers consistently, need to consider this in analysis.

Universal Hash Function •

Universal Hash Function •

Amortized Analysis

Amortized Analysis

“Amortized” • verb (used with object), amortized, amortizing. • 1. Finance. • to liquidate

“Amortized” • verb (used with object), amortized, amortizing. • 1. Finance. • to liquidate or extinguish (a mortgage, debt, or other obligation), especially by periodic payments to the creditor or to a sinking fund. • to write off a cost of (an asset) gradually. Definition from Dictionary. com

Amortized Analysis in Algorithms • Scenario: Operation A is repeated many times in an

Amortized Analysis in Algorithms • Scenario: Operation A is repeated many times in an algorithm. • In some cases, Operation A is very fast. • In some other cases, Operation A can be very slow. • Idea: If the bad cases don’t happen very often, then the average cost of Operation A can still be small.

Amortized Analysis in disguise • Merge. Sort Merge(b[], c[]) 1. a[] = empty 2.

Amortized Analysis in disguise • Merge. Sort Merge(b[], c[]) 1. a[] = empty 2. i = 1 3. FOR j = 1 to length(c[]) 4. WHILE b[i] < c[j] 5. a. append(b[i]); i = i+1 6. a. append(c[j]); j = j+1 7. RETURN a[] • For each iteration, steps 4 -5 can take different time • Worst case: O(n) per iteration O(n 2)? • The total amount of time 4 -5 can take is O(n). • “Amortized Cost” = O(1)

Amortized Analysis in disguise • DFS • For each vertex, the number of edges

Amortized Analysis in disguise • DFS • For each vertex, the number of edges can be different. • If a graph has m = 5 n edges, and there is one vertex connected to n/2 other vertices. • Worst case for a vertex: O(n) O(n 2)? • No: the total amount of time is proportional to the number of edges. • “Amortized Cost” = O(m/n + 1)

Dynamic Array problem • Design a data-structure to store an array. • Items can

Dynamic Array problem • Design a data-structure to store an array. • Items can be added to the end of the array. • At any time, the amount of memory should be proportional to the length of the array. • Example: Array. List in java, vector in C++ • Goal: Design a data-structure such that adding an item has O(1) amortized running time.

Why naïve approach does not work a 1 2 3 4 5 6 7

Why naïve approach does not work a 1 2 3 4 5 6 7 a. add(8) 1 2 3 4 5 6 7 8 Need to allocate a new piece of memory, copy the first 7 elements and add 8. a. add(9) 1 2 3 4 5 6 7 8 Need to allocate a new piece of memory, copy the first 8 elements and add 9. Running Time for n add operation = O(n 2)! Amortized cost = O(n 2)/n = O(n) 9