Data Structure and Algorithms BITS Pilani Campus Dr

Data Structure and Algorithms BITS Pilani Campus Dr. Maheswari Karthikeyan Lecture 9 20/04/2013

BITS Pilani Campus § Hashing § Dynamic Programming

Hash Table Hash table: a data structure, implemented as an array of objects, where the search keys correspond to the array indexes • A “much smarter” version of array • • Searching in average takes O(1) time Drawbacks: Traverse the data ----- In random order No range search BITS Pilani, Pilani Campus

Direct Addressing • If the key field is “age”, you have at most 120 numbers • The domain of age numbers is small • So, an array with 120 slots can be built • An array can support time O(1) search by “directaddressing” • What if there are multiple persons having the same age? • What if the domain of the key field is very large? E. g. , Salary: [0 – 100, 000] BITS Pilani, Pilani Campus

Direct Addressing Student Records 1 2 3 8 9 13 14 BITS Pilani, Pilani Campus

Hashing • Hashing is a function that maps each key to a location in memory. • A key’s location does not depend on other elements, and does not change after insertion. • A good hash function should be easy to compute. • With such a hash function, the dictionary operations can be implemented in O(1) time. BITS Pilani, Pilani Campus

Hashing • Map key values to hash table addresses keys -> hash table address • This applies to find, insert, and remove • Usually: integers -> {0, 1, 2, …, Hsize-1} Typical example: f(n) = n mod Hsize • Non-numeric keys converted to numbers – For example, strings converted to numbers as • Sum of ASCII values • First three characters BITS Pilani, Pilani Campus

Hashing (mod 9) 9 Student Records 10 20 39 4 14 8 BITS Pilani, Pilani Campus

Hashing • Choose a hash function h; it also determines the hash table size. • Given an item x with key k, put x at location h(k). • To find if x is in the set, check location h(k). • What to do if more than one keys hash to the same value. This is called collision. • Two methods to handle collision: – Separate chaining – Open addressing BITS Pilani, Pilani Campus

Separate chaining • Maintain a list of all elements that hash to the same value • Search -- using the hash function to determine which list to traverse 0 1 2 3 4 5 6 7 8 9 10 23 24 36 1 14 16 17 7 29 31 20 56 42 BITS Pilani, Pilani Campus

Separate chaining 53 = 4 x 11 + 9 53 mod 11 = 9 0 1 2 3 4 5 6 7 8 9 10 23 24 36 1 56 14 16 17 7 29 31 20 42 0 1 2 3 4 5 6 7 8 9 10 23 24 36 1 14 16 17 7 29 31 20 56 42 53 BITS Pilani, Pilani Campus

Analysis of Hashing with Chaining Worst case – All keys hash into the same bucket – a single linked list. – insert, delete, find take O(n) time. Average case – Keys are uniformly distributed into buckets – O(1+N/B): N is the number of elements in a hash table, B is the number of buckets. – If N = O(B), then O(1) time per operation. – N/B is called the load factor of the hash table. BITS Pilani, Pilani Campus

Open addressing • If collision happens, alternative cells are tried until an empty cell is found. • Three methods of Open addressing: • Linear Probing • Quadratic Probing • Double hashing 0 1 2 3 4 5 6 7 8 9 10 42 1 24 14 16 28 7 31 9 BITS Pilani, Pilani Campus

Linear Probing (insert 12) 12 = 1 x 11 + 1 12 mod 11 = 1 0 1 2 3 4 5 6 7 8 9 10 42 1 24 14 16 28 7 31 9 0 1 2 3 4 5 6 7 8 9 10 42 1 24 14 12 16 28 7 31 9 BITS Pilani, Pilani Campus

Search with linear probing (Search 15) 15 = 1 x 11 + 4 15 mod 11 = 4 0 1 2 3 4 5 6 7 8 9 10 42 1 24 14 12 16 28 7 31 9 NOT FOUND ! BITS Pilani, Pilani Campus

Deletion in Hashing with Linear Probing • Since empty buckets are used to terminate search, standard deletion does not work. • One simple idea is to not delete, but mark. • Insert: put item in first empty or marked bucket. • Search: Continue past marked buckets. • Delete: just mark the bucket as deleted. • Advantage: Easy and correct. • Disadvantage: table can become full with dead items. BITS Pilani, Pilani Campus

Quadratic Probing Solves the clustering problem in Linear Probing – Check H(x) – If collision occurs check H(x) + 1 – If collision occurs check H(x) + 4 – If collision occurs check H(x) + 9 – If collision occurs check H(x) + 16 –. . . – H(x) + i 2 BITS Pilani, Pilani Campus

Quadratic Probing (insert 12) 12 = 1 x 11 + 1 12 mod 11 = 1 0 1 2 3 4 5 6 7 8 9 10 42 1 24 14 16 28 7 31 9 0 1 2 3 4 5 6 7 8 9 10 42 1 24 14 12 16 28 7 31 9 BITS Pilani, Pilani Campus

Double Hashing When collision occurs use a second hash function – Hash 2 (x) = R – (x mod R) – R: greatest prime number smaller than table-size Inserting 12 H 2(x) = 7 – (x mod 7) = 7 – (12 mod 7) = 2 – Check H(x) – If collision occurs check H(x) + 2 – If collision occurs check H(x) + 4 – If collision occurs check H(x) + 6 – If collision occurs check H(x) + 8 – H(x) + i * H 2(x) BITS Pilani, Pilani Campus

Double Hashing (insert 12) 12 = 1 x 11 + 1 12 mod 11 = 1 7 – 12 mod 7 = 2 0 1 2 3 4 5 6 7 8 9 10 42 1 24 14 16 28 7 31 9 0 1 2 3 4 5 6 7 8 9 10 42 1 24 14 12 16 28 7 31 9 BITS Pilani, Pilani Campus

Collision Functions Hi(x)= (H(x)+i) mod B – Linear probing Hi(x)= (H(x)+i 2) mod B – Quadratic probing Hi(x)= (H(x)+ i * H 2(x)) mod B - Double hashing BITS Pilani, Pilani Campus

Example Insert the following values into a hash table of size 10 using the hash equation (x 2 +1) % 10 using the linear probing, quadratic probing and separate chaining technique Insert these values in sequential order: 1, 2, 5, 6, 8, 4, 9, 3, 10, 7 BITS Pilani, Pilani Campus

Dynamic Programming • An algorithm design technique for optimization problems (similar to divide and conquer) • Applicable when subproblems are not independent – Subproblems share subsubproblems – A divide and conquer approach would repeatedly solve the common subproblems – Dynamic programming solves every subproblem just once and stores the answer in a table BITS Pilani, Pilani Campus

Dynamic Programming Used for optimization problems – A set of choices must be made to get an optimal solution – Find a solution with the optimal value (minimum or maximum) – There may be many solutions that return the optimal value: an optimal solution BITS Pilani, Pilani Campus

Dynamic Programming Algorithm 1. Characterize the structure of an optimal solution 2. Recursively define the value of an optimal solution 3. Compute the value of an optimal solution in a bottom-up fashion 4. Construct an optimal solution from computed information BITS Pilani, Pilani Campus

Elements of Dynamic Programming • Optimal Substructure – An optimal solution to a problem contains within it an optimal solution to subproblems – Optimal solution to the entire problem is build in a bottom-up manner from optimal solutions to subproblems • Overlapping Subproblems – If a recursive algorithm revisits the same subproblems over and over the problem has overlapping subproblems BITS Pilani, Pilani Campus

Matrix-Chain Multiplication Problem: given a sequence A 1, A 2, …, An , compute the product: A 1 A 2 An Matrix compatibility: C=A B col. A = row. B row. C = row. A col. C = col. B A 1 A 2 Ai Ai+1 An coli = rowi+1 BITS Pilani, Pilani Campus

Matrix-Chain Multiplication In what order should we multiply the matrices? A 1 A 2 An Parenthesize the product to get the order in which matrices are multiplied E. g. : A 1 A 2 A 3 = ((A 1 A 2) A 3) = (A 1 (A 2 A 3)) Which one of these orderings should we choose? – The order in which we multiply the matrices has a significant impact on the cost of evaluating the product BITS Pilani, Pilani Campus

MATRIX-MULTIPLY(A, B) if columns[A] rows[B] then error “incompatible dimensions” else for i 1 to rows[A] cols[A] cols[B] do for j 1 to columns[B] multiplications do C[i, j] = 0 for k 1 to columns[A] do C[i, j] + A[i, k] B[k, j] k j i rows[A] * A j cols[B] k cols[B] = i B C rows[A] BITS Pilani, Pilani Campus

Example A 1 A 2 A 3 A 1: 10 x 100 A 2: 100 x 5 A 3: 5 x 50 1. ((A 1 A 2) A 3): A 1 A 2 = 10 x 100 x 5 = 5, 000 (10 x 5) ((A 1 A 2) A 3) = 10 x 50 = 2, 500 Total: 7, 500 scalar multiplications 2. (A 1 (A 2 A 3)): 50) A 2 A 3 = 100 x 50 = 25, 000 (100 x (A 1 (A 2 A 3)) = 10 x 100 x 50 = 50, 000 Total: 75, 000 scalar multiplications BITS Pilani, Pilani Campus

Matrix-Chain Multiplication Given a chain of matrices A 1, A 2, …, An , where for i = 1, 2, …, n matrix Ai has dimensions pi-1 x pi, fully parenthesize the product A 1 A 2 An in a way that minimizes the number of scalar multiplications. A 1 p 0 x p 1 A 2 p 1 x p 2 Ai pi-1 x pi Ai+1 pi x pi+1 An pn-1 x pn BITS Pilani, Pilani Campus

1. The Structure of an Optimal Parenthesization Notation: Ai…j = Ai Ai+1 Aj, i j For i < j: Ai…j = Ai Ai+1 Ak Ak+1 Aj = Ai…k Ak+1…j Suppose that an optimal parenthesization of Ai…j splits the product between Ak and Ak+1, where i k<j BITS Pilani, Pilani Campus

Optimal Substructure Ai…j = Ai…k Ak+1…j The parenthesization of the “prefix” Ai…k must be an optimal parenthesization ! An optimal solution to an instance of the matrix-chain multiplication contains within it optimal solutions to subproblems BITS Pilani, Pilani Campus

2. A Recursive Solution Subproblem: • determine the minimum cost of parenthesizing Ai…j = Ai Ai+1 Aj for 1 i j n • Let m[i, j] = the minimum number of multiplications needed to compute Ai…j – Full problem (A 1. . n): m[1, n] – i = j: Ai…i = Ai m[i, i] = 0 for i = 1, 2, …, n BITS Pilani, Pilani Campus

2. A Recursive Solution Ai…j = Ai Consider the subproblem of parenthesizing Ai+1 Aj for 1 i j n pi-1 pkpj = Ai…k Ak+1…j m[k+1, j] m[i, k] for i k < j Assume that the optimal parenthesization splits the product Ai Ai+1 Aj at k (i k < j) m[i, j] = m[i, k] min # of multiplications to compute Ai…k + m[k+1, j] min # of multiplications to compute Ak+1…j + p i-1 pkpj # of multiplications to compute Ai…k. Ak…j BITS Pilani, Pilani Campus

2. A Recursive Solution (cont. ) m[i, j] = m[i, k] + m[k+1, j] • We do not know the value of k + p i-1 pkpj – There are j – i possible values for k: k = i, i+1, …, j-1 • Minimizing the cost of parenthesizing the product Ai Ai+1 Aj becomes: 0 if i = j m[i, j] = min {m[i, k] + m[k+1, j] + pi-1 pkpj} if i < j i k<j BITS Pilani, Pilani Campus

3. Computing the Optimal Costs 0 if i = j m[i, j] = min {m[i, k] + m[k+1, j] + pi-1 pkpj} if i < j i k<j • How many subproblems do we have? – Parenthesize Ai…j 1 2 2 for 1 i j n (n ) n – One problem for each choice of i and j 3 2 1 3 n j BITS Pilani, Pilani Campus

3. Computing the Optimal Costs (cont. ) 0 if i = j m[i, j] = min {m[i, k] + m[k+1, j] + pi-1 pkpj} if i < j i k<j • How do we fill in the tables m[1. . n, 1. . n] and s[1. . n, 1. . n] ? – Determine which entries of the table are used in computing m[i, j] Ai…j = Ai…k Ak+1…j – Fill in m such that it corresponds to solving problems of increasing length BITS Pilani, Pilani Campus

3. Computing the Optimal Costs (cont. ) MATRIX-CHAIN-ORDER(P) n length[p] – 1 for i 1 to n do m[i, i] 0 for l 2 to n do for i 1 to n - l +1 do j i + l – 1 m[i, j] for k i to j-1 do q m[i, k] + m[k+1, j] + p i-1 p k p j if q < m[i, j] then m[i, j] q s[i, j] k return m and s BITS Pilani, Pilani Campus

3. Computing the Optimal Costs (cont. ) 0 if i = j m[i, j] = min {m[i, k] + m[k+1, j] + pi-1 pkpj} if i < j i k<j Length = 1: i = j, i = 1, 2, …, n Length = 2: j = i + 1, i = 1, 2, …, n-1 m[1, n] gives the optimal solution to the problem • Compute rows from bottom to top and from left to right • In a similar matrix s we keep the optimal values of k n 1 2 d n o rst c sen fi 3 j 3 2 1 i BITS Pilani, Pilani Campus

Example: min {m[i, k] + m[k+1, j] + pi-1 pkpj} m[2, 2] + m[3, 5] + p 1 p 2 p 5 m[2, 3] + m[4, 5] + p 1 p 3 p 5 m[2, 4] + m[5, 5] + p 1 p 4 p 5 m[2, 5] = min 6 5 1 2 3 4 5 k=2 k=3 k=4 6 • Values m[i, j] depend only on j values that have been previously computed 4 3 2 1 i BITS Pilani, Pilani Campus

Example min {m[i, k] + m[k+1, j] + pi-1 pkpj} 1 Compute A 1 A 2 A 3 A 1: 10 x 100 (p 0 x p 1) A 2: 100 x 5 (p 1 x p 2) 2 3 2 2 3 7500 25000 0 2 1 5000 0 0 A 3: 5 x 50 (p 2 x p 3) 1 m[i, i] = 0 for i = 1, 2, 3 m[1, 2] = m[1, 1] + m[2, 2] + p 0 p 1 p 2 (A 1 A 2) = 0 + 10 *100* 5 = 5, 000 m[2, 3] = m[2, 2] + m[3, 3] + p 1 p 2 p 3 (A 2 A 3) = 0 + 100 * 50 = 25, 000 m[1, 3] = min m[1, 1] + m[2, 3] + p 0 p 1 p 3 = 75, 000 (A 1(A 2 A 3)) m[1, 2] + m[3, 3] + p 0 p 2 p 3 = 7, 500 ((A 1 A 2)A 3) BITS Pilani, Pilani Campus

Construct the Optimal Solution • Store the optimal choice made at each subproblem 1 2 3 n • s[i, j] = a value of k such that an optimal parenthesization of Ai. . jn splits the product between Ak k and Ak+1 3 2 1 j BITS Pilani, Pilani Campus

Construct the Optimal Solution s[1, n] is associated with the entire product A 1. . n – The final matrix multiplication will be split at k = s[1, n] A 1. . n = A 1. . s[1, n] As[1, n]+1. . n – For each subproduct recursively find the corresponding value of k that results in an optimal parenthesization n 3 2 1 1 2 3 n j BITS Pilani, Pilani Campus

4. Construct the Optimal Solution • s[i, j] = value of k such that the optimal parenthesization of Ai Ai+1 Aj splits the product between Ak and Ak+1 6 5 1 3 3 3 1 4 3 2 1 1 - 2 3 3 3 3 - - 4 5 4 - 5 5 - 6 - • s[1, n] = 3 A 1. . 6 = A 1. . 3 A 4. . 6 • s[1, 3] = 1 A 1. . 3 = A 1. . 1 A 2. . 3 • s[4, 6] = 5 A 4. . 6 = A 4. . 5 A 6. . 6 j i BITS Pilani, Pilani Campus

4. Construct the Optimal Solution Mult(A, s, i, j) { if (i<j) { X= Mult(A, s, i, s(i, j)); Y= Mult(A, s, s(i, j)+1, j); return X*Y; } else return A(i); BITS Pilani, Pilani Campus

4. Construct the Optimal Solution (cont. ) Mult(A, s, i, j) { if (i<j) { X= Mult(A, s, i, s(i, j)); Y= Mult(A, s, s(i, j)+1, j); return X*Y; } else return A(i); } 6 5 1 3 3 3 1 4 3 2 1 1 - 2 3 3 3 3 - 4 5 4 - 5 5 - 6 - j i BITS Pilani, Pilani Campus

Example: A 1 A 6 ( ( A 1 ( A 2 A 3 ) ) Mult(A, s, 1, 6 1 ) 6 3 (A 4. A 5). A 6 A 1. (A 2. A 3) 5 3 A 2. A 3 (4, 6) (1, 3) 4 3 A 6 A 1 A 4. A 5 1 3 (2, 3) (1, 1) (6, 6) (4, 5) A 3 2 1 A 2 (3, 3)(4, 4) (2, 2) (5, 5) 1 A 4 A 5 ( ( A 4 A 5 ) A 6 ) ) 2 3 4 5 6 3 3 5 5 3 3 4 3 3 j 2 i BITS Pilani, Pilani Campus

Example Paranthesize the following Matrices: A 1 : 5 x 4 A 2 : 4 x 6 A 3 : 6 x 2 A 4 : 2 x 7 Write down the m matrix BITS Pilani, Pilani Campus

Example m[i, i] = 0 for i = 1, 2, 3, 4 m[1, 2] = m[1, 1] + m[2, 2] + p 0 p 1 p 2 = 0 + 5*4*6 = 120 m[2, 3] = m[2, 2] + m[3, 3] + p 1 p 2 p 3 = 0 + 4 * 6 * 2 = 48 m[3, 4] = m[3, 3] + m[4, 4] + p 2 p 3 p 4 = 0 + 6 *2 * 7 = 84 1 4 3 2 1 2 3 84 48 120 0 0 4 0 j 0 i BITS Pilani, Pilani Campus

Example m[1, 3] = min m[1, 1] + m[2, 3] + p 0 p 1 p 3 = 0 + 48 + 5*4*2 = 48+ 40 = 88 m[1, 2] + m[3, 3] + p 0 p 2 p 3 = 120 + 5 * 6 * 2 = 180 m [2, 4] = min m[2, 2] + m[3, 4] + p 1 p 2 p 4 = 0 + 84 + 4*6*7 = 84+ 168 = 252 m[2, 3] + m[4, 4] + p 1 p 3 p 4 = 48 + 0 + 4*2*7 = 48 + 56 = 104 BITS Pilani, Pilani Campus

Example 1 2 3 4 158 104 84 3 88 48 0 2 120 0 1 0 4 0 j i BITS Pilani, Pilani Campus

Example 4 3 2 1 1 3 1 1 1 2 3 2 2 i 3 3 3 4 4 j A 1. (A 2. A 3). A 4) BITS Pilani, Pilani Campus

Example What is the best order in which to multiply matrices A, B, C, D, E if A is 3*10, B is 10*200, C is 200*46, D is 46*150 and E is 150*5? Write down the dynamic programming array. BITS Pilani, Pilani Campus