MCA 301 Design and Analysis of Algorithms Instructor

Expected Running Times and Randomized Algorithms Instructor Neelima Gupta ngupta@cs. du. ac. in

Expected Running Time of Insertion Sort (at rth position) x 1, x 2, .

Expected Running Time of Insertion Sort �Let Xi be the random variable which represents

Expected Running Time of Insertion Sort (at jth position) x 1, x 2, .

�Position i i-1 i-2. . . 2 1 # of Comparisions 1 2 3.

Thus, E(Xi) = (1/i) { i-1Σk=1 k + (i-1) } where 1/i is the

For n number of elements, expected time taken is, T = nΣi=2 (1/i) {

Quick-Sort � Pick the first item from the array--call it the pivot � Partition

Quicksort: Expected number of comparisons � Partition may generate splits (0: n-1, 1: n-2,

Randomized Quick-Sort � Pick an element from the array--call it the pivot � Partition

Remarks �Not much different from the Q-sort except that earlier, the algorithm was deterministic

Randomized Algorithms �A randomized algorithm performs coin tosses (i. e. , uses random bits)

Assumptions �coins are unbiased, and �coin tosses are independent �The worst-case running time of

Monte Carlo Algorithms �Running times are guaranteed but the output may not be completely

Las Vegas Algorithms �Output is guaranteed to be correct. �Bounds on running times hold

Why expected running times? �Markov’s inequality P( X > k E(X)) < 1/k i.

Markov’s Bound P(X<k. M)< 1/k , where k is a constant. Chernouff’s Bound P(X>2μ)<

Binary Search Tree � What �A is a binary search tree? BST is a

Binary Search Tree � Pick the first item from the array--call it the pivot…it

Binary Search Tree Consider the following input: 1, 2, 3 ………………… 10, 000. �What

Randomly Built Binary Search Tree � Pick an item from the array randomly --call

Example �Consider the input 10, 20, 30, 40, 50, 60, 70, 80, 90, 100.

Height of the RBST WLOG, assume that the keys are distinct. (What if they

$�Y=2^H =2. max{2^H 1, 2^H 2} �E(EH(T(n))): Expected value of exponential ht. of the$

�Construction Time? �Search Time? �What is the worst case input?

Acknowledgements �Kunal Verma �Nidhi Aggarwal �And other students of MSc(CS) batch 2009.

Hashing �Motivation: symbol tables �A compiler uses a symbol table to relate symbols to

Hash Tables �More formally: �Given a table T and a record x, with key

Hash Functions �Next problem: collision U (universe of keys) k 2 0 h(k 1)

Resolving Collisions �How can we solve the problem of collisions? �One of the solution

Chaining �Chaining puts elements that hash to the same slot in T a linked

Chaining �How do we insert an element? U (universe of keys) k 4 K

Chaining �How do we delete an element? U (universe of keys) k 4 K

Chaining �How do we search for a element with a T given key? U

Analysis of Chaining �Assume simple uniform hashing: each key in table is equally likely

Analysis of Chaining Continued �So the cost of searching = O(1 + ) �If

If we could prove this, P(failure)<1/k (we are sort of happy) P(failure)<1/nk (most of

Slides: 44

Download presentation

MCA 301: Design and Analysis of Algorithms Instructor Neelima Gupta ngupta@cs. du. ac. in

Expected Running Times and Randomized Algorithms Instructor Neelima Gupta ngupta@cs. du. ac. in

Expected Running Time of Insertion Sort (at rth position) x 1, x 2, . . , xi-1, xi, . . . . …, xn For I = 2 to n Insert the ith element xi in the partially sorted list x 1, x 2, . . , xi-1.

Expected Running Time of Insertion Sort �Let Xi be the random variable which represents the number of comparisons required to insert ith element of the input array in the sorted sub array of first i-1 elements. �Xi : can take values 1…i-1 (denoted by xi 1, xi 2, . . . . …, xii) E(Xi) = Σj xijp(xij ) where E(Xi) is the expected value Xi And, p(xij) is the probability of inserting xi in the jth position 1≤j≤i

Expected Running Time of Insertion Sort (at jth position) x 1, x 2, . . , xi-1, xi, . . . . …, xn How many comparisons it makes to insert ith element in jth position?

�Position i i-1 i-2. . . 2 1 # of Comparisions 1 2 3. . . i-1 Note: Here, both position 2 and 1 have # of Comparisions equal to i-1. Why? Because to insert element at position 2 we have to compare with previously first element. and after that comparison we know which of them come first and which at second.

Thus, E(Xi) = (1/i) { i-1Σk=1 k + (i-1) } where 1/i is the probability to insert at jth position in the i possible positions. For n elements, E(X 1 + X 2 +. . . +Xn) = nΣi=2 E(Xi) = nΣ } i-1Σ (1/i) { i=2 k=1 k + (i-1) = (n-1)(n-4)/4

For n number of elements, expected time taken is, T = nΣi=2 (1/i) { i-1Σk=1 k + (i-1) } where 1/i is the probability to insert at rth position in the i possible positions. E(X 1 + X 2 +. . . +Xn) = nΣi=1 E(Xi) Where, Xi is expected value of inserting Xi element. T = (n-1)(n-4)/4 Therefore average case of insertion sort takes Θ(n 2)

Quick-Sort � Pick the first item from the array--call it the pivot � Partition the items in the array around the pivot so all elements to the left are to the pivot and all elements to the right are greater than the pivot � Use recursion to sort the two partitions partition 1: items pivot partition: items > pivot

Quicksort: Expected number of comparisons � Partition may generate splits (0: n-1, 1: n-2, 2: n-3, … , n-2: 1, n-1: 0) each with probability 1/n �If T(n) is the expected running time,

Randomized Quick-Sort � Pick an element from the array--call it the pivot � Partition the items in the array around the pivot so all elements to the left are to the pivot and all elements to the right are greater than the pivot � Use recursion to sort the two partitions partition 1: items pivot partition: items > pivot

Remarks �Not much different from the Q-sort except that earlier, the algorithm was deterministic and the bounds were probabilistic. �Here the algorithm is also randomized. We pick an element to be a pivot randomly. Notice that there isn’t any difference as to how does the algorithm behave there onwards? �In the earlier case, we can identify the worst case input. Here no input is worst case.

Randomized Select

Randomized Algorithms �A randomized algorithm performs coin tosses (i. e. , uses random bits) to control its execution �i ← random() if i = 0 do A … else { i. e. i = 1} do B … �Its running time depends on the outcomes of the coin tosses

Assumptions �coins are unbiased, and �coin tosses are independent �The worst-case running time of a randomized algorithm may be large but occurs with very low probability (e. g. , it occurs when all the coin tosses give “heads”)

Monte Carlo Algorithms �Running times are guaranteed but the output may not be completely correct. �Probability of error is low.

Las Vegas Algorithms �Output is guaranteed to be correct. �Bounds on running times hold with high probability. �What type of algorithm is Randomized Qsort?

Why expected running times? �Markov’s inequality P( X > k E(X)) < 1/k i. e. the probability that the algorithm will take more than O(2 E(X)) time is less than 1/2. Or the probability that the algorithm will take more than O(10 E(X)) time is less than 1/10. This is the reason why Qsort does well in practice.

Markov’s Bound P(X<k. M)< 1/k , where k is a constant. Chernouff’s Bound P(X>2μ)< ½ A More Stronger Result P(X>k μ )< 1/nk, where k is a constant.

Binary Search Tree � What �A is a binary search tree? BST is a possibly empty rooted tree with a key value, a possible empty left subtree and a possible empty right subtree. � Each of the left subtree and the right subtree is a BST.

Binary Search Tree � Pick the first item from the array--call it the pivot…it becomes the root of the BST. � Partition the items in the array around the pivot so that all elements to the left are the pivot and all elements to the right are greater than the pivot � Recursively Build a BST on each partition. They become the left and the right sub-tree of the root.

Binary Search Tree Consider the following input: 1, 2, 3 ………………… 10, 000. �What is the time for construction? �Search Time?

Randomly Built Binary Search Tree � Pick an item from the array randomly --call it the pivot…it becomes the root of the BST. � Partition the items in the array around the pivot so that all elements to the left are the pivot and all elements to the right are greater than the pivot � Recursively Build a BST on each partition. They become the left and the right sub-tree of the root.

Example �Consider the input 10, 20, 30, 40, 50, 60, 70, 80, 90, 100.

Height of the RBST WLOG, assume that the keys are distinct. (What if they are not? ) �Rank(x) = number of elements < x �Let Xi : height of the tree rooted at a node with rank=i. �Let Yi : exponential height of the tree=2^Xi �Let H : height of the entire BST, then H=max{H 1, H 2} + 1 where H 1 : ht. of left subtree H 2 : ht. of right subtree

$�Y=2^H =2. max{2^H 1, 2^H 2} �E(EH(T(n))): Expected value of exponential ht. of the$

�Y=2^H =2. max{2^H 1, 2^H 2} �E(EH(T(n))): Expected value of exponential ht. of the tree with ‘n’ nodes. �E(EH(T(n))) =2/n ∑ max{EH(T(k)), EH(T(n-1 -k))} =O(n^3) �E(H(T(n))) =E(log (EH(T(n)))) = O(log n)

�Construction Time? �Search Time? �What is the worst case input?

Acknowledgements �Kunal Verma �Nidhi Aggarwal �And other students of MSc(CS) batch 2009.

Hashing �Motivation: symbol tables �A compiler uses a symbol table to relate symbols to associated data � Symbols: variable names, procedure names, etc. � Associated data: memory location, call graph, etc. �For a symbol table (also called a dictionary), we care about search, insertion, and deletion �We typically don’t care about sorted order

Hash Tables �More formally: �Given a table T and a record x, with key (= symbol) and satellite data, we need to support: Insert (T, x) � Delete (T, x) � Search(T, x) � �We want these to be fast, but don’t care about sorting the records �The structure we will use is a hash table �Supports all the above in O(1) expected time!

Hash Functions �Next problem: collision U (universe of keys) k 2 0 h(k 1) h(k 4) k 1 k 4 K (actual keys) T k 5 h(k 2) = h(k 5) k 3 h(k 3) m-1

Resolving Collisions �How can we solve the problem of collisions? �One of the solution is : chaining �Other solutions: open addressing

Chaining �Chaining puts elements that hash to the same slot in T a linked list: U (universe of keys) k 4 K k 5 (actual k 7 keys) k 6 k 8 k 1 k 4 —— k 5 k 2 —— —— —— k 1 k 2 —— k 3 —— k 8 —— k 6 —— k 7 ——

Chaining �How do we insert an element? U (universe of keys) k 4 K k 5 (actual k 7 keys) k 6 k 8 k 1 k 4 —— k 5 k 2 —— —— —— k 1 k 2 T —— k 3 —— k 8 —— k 6 —— k 7 ——

Chaining �How do we delete an element? U (universe of keys) k 4 K k 5 (actual k 7 keys) k 6 k 8 k 1 k 4 —— k 5 k 2 —— —— —— k 1 k 2 T —— k 3 —— k 8 —— k 6 —— k 7 ——

Chaining �How do we search for a element with a T given key? U (universe of keys) k 4 K k 5 (actual k 7 keys) k 6 k 8 k 1 k 4 —— k 5 k 2 —— —— —— k 1 k 2 —— k 3 —— k 8 —— k 6 —— k 7 ——

Analysis of Chaining �Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot �Given n keys and m slots in the table: the load factor = n/m = average # keys per slot �What will be the average cost of an unsuccessful search for a key?

Analysis of Chaining �Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot �Given n keys and m slots in the table, the load factor = n/m = average # keys per slot �What will be the average cost of an unsuccessful search for a key? A: O(1+ )

Analysis of Chaining Continued �So the cost of searching = O(1 + ) �If the number of keys n is proportional to the number of slots in the table, what is ? � A: = O(1) �In other words, we can make the expected cost of searching constant if we make constant

If we could prove this, P(failure)<1/k (we are sort of happy) P(failure)<1/nk (most of times this is true and we’re happy ) P(failure)<1/2 n (this is difficult but still we want this)

Acknowledgements �Kunal Verma �Nidhi Aggarwal �And other students of MSc(CS) batch 2009.

END