Design of Algorithms by Induction Part 1 Algorithm

Design of Algorithms by Induction Part 1 Algorithm Design and Analysis Bibliography: [Manber]- Chap 5

Algorithm Analysis <-> Design • Given an algorithm, we can – Analyze its complexity – Proof its correctness • But: – Given a problem, how do we design a solution that is both correct and efficient ? • Is there a general method for this ?

Induction proofs vs Design of algorithms by induction (1) • Induction used to prove that a statement P(n) holds for all integers n: – Base case: Prove P(0) – Assumption: assume that P(n-1) is true – Induction step: prove that P(n-1) implies P(n) for all n>0 – Strong induction: when we assume P(k) is true for all k<=n-1 and use this in proving P(n)

Induction proofs vs Design of algorithms by induction (2) • Induction used in algorithm design: – Base case: Solve a small instance of the problem – Assumption: assume you can solve smaller instances of the problem – Induction step: Show you can construct the solution of the problem from the solution(s) of the smaller problem(s)

Design of algorithms by induction • Represents a fundamental design principle that underlies techniques such as divide & conquer, dynamic programming and even greedy • How to reduce the problem to a smaller problem or a set of smaller problems ? (n -> n-1, n/2, n/4, …? ) • Examples: – – – The successful party problem [Manber 5. 3] Part 1 The celebrity problem [Manber 5. 5] The skyline problem [Manber 5. 6] Part 2 One knapsack problem [Manber 5. 10] The Max Consecutive Subsequence [Manber 5. 8]

The successful party problem • Problem: you are arranging a party and have a list of n persons that you could invite. In order to have a successful party, you want to invite as many people as possible, but every invited person must be friends with at least k of the other party guests. For each person, you know his/her friends. Find the set of invited people.

Successful party - Example K=3 Ann Bob Finn Chris Ellie Dan

Successful party - Example K=3 Ann Bob Finn Ellie Dan

Successful party - Example K=3 Ann Bob Ellie Dan Party: {Ann, Bob, Dan, Ellie}

Successful party – Example 2 K=3 Ann Bob Finn Chris Ellie Dan

Successful party – Example 2 K=3 Ann Bob Finn Ellie Dan

Successful party – Example 2 K=3 Ann Bob Ellie Dan

Successful party – Example 2 K=3 Bob Ellie No party possible ! Dan

Successful party – Direct approach • • Direct approach for solving: remove persons who have less than k friends But: which is the right order of removing ? 1. Remove all persons with less than k friends, then deal with the persons that are left without enough friends ? 2. Remove first one person, then continue with affected persons ? Instead of thinking about our algorithm as a sequence of steps to be executed, think of proving a theorem that the algorithm exists

The design by induction approach • Instead of thinking about our algorithm as a sequence of steps to be executed, think of proving a theorem that the algorithm exists • We need to prove that this “theorem” holds for a base case, assume that it holds for “n-1”, and then prove that if it holds for “n-1” this implies that it holds for “n”

Successful party - Solution • Induction hypothesis: We know how to find the solution (maximal list of invited persons that have at least k friends among the other invited persons), provided that the number of given persons is <n • Base case: – n<= k: no person can be invited – n=k+1: if every person knows all of the other persons, everybody gets invited. Otherwise, no one can be invited.

Successful party - Solution • Inductive step: – Assume we know how to select the invited persons out of a list of n-1, prove that we know how to select the invited persons out of a list of n, n>k+1 – If all the n persons have more than k friends among them, all n persons get invited => problem solved – Else, there exists at least one person that has less than k friends. Remove this person and the solution is what results from solving the problem for the remaining n-1 persons. By induction assumption, we know to solve the problem for n-1 => problem solved

Party – Designing the Solution Function Party(ps : Person. Set) n = card (ps) if n <= k then * no person is invited return if n = k+1 then if (*everybody is friend with everybody in ps) * invite all persons from ps return if (*everybody from ps has at least k friends from ps) * invite all persons from ps return * Find first person p who has less than k friends ps 2 = ps – {p} party(ps 2) return

Party – Designing the Solution Function Party(ps : Person. Set) n = card (ps) if n = k then * no person is invited return if n = k+1 then Designing the algorithm if (*everybody is friend with everybody) * invite all persons about from ps the solution) (thinking return if is very often a naturally recursive (*everybodyprocess. has at least k friends in the set ps) * invite all persons from ps The implementation does NOT need return to be recursive ! * Find first person p who has less than k friends ps 2 = ps – {p} party(ps 2) return

Party – The Solution – No Recurs. Function Party(ps : Person. Set) n = card (ps) while (n > k) if (*everybody has at least k friends in ps) * invite all persons from ps return * Find first person p who has less than k friends ps = ps – {p} * no person can be invited return

Party – The Solution – No Recurs. Function Party(ps : Person. Set) n = card (ps) while (n > k) if (*everybody has at least k friends in ps) * invite all persons from ps return Designing the algorithm * Find first person p who has less than k friends ps = ps – {p} (thinking about the solution) is * no person can be invited often done in abstract terms. return The implementation will chose the best datastructures to represent these abstractions

Successful party - Conclusions • Designing by induction got us to a correct algorithm, because we designed it like proving its correctness – Every invited person knows at least k other invited – We got the maximum possible number of invited persons • The best way to reduce a problem is to eliminate some of its elements. – In this case, it was clear which persons to eliminate – We will see examples where the elimination is not so straightforward

The Celebrity problem • Problem: A celebrity in a group of people is someone who is known by everybody but does not know anyone. You are allowed to ask anyone from the group a question such as “Do you know that person? ” pointing to any other person from the group. Identify the celebrity (if one exists) by asking as few questions as possible • Problem: • Given a n*n matrix “know” with know[p, q] = true if p knows q and know[p, q] = false otherwise. • Determine whethere exists an i such that: – Know[j; i] = true (for all j, j≠ i) and Know[i; j] = false (for all j, j≠i )

Celebrity problem i 1 1 i 0 0 1

Celebrity - Solution 1 • Brute force approach: ask questions arbitrary, for each person ask questions for all others => n*(n-1) questions asked

Celebrity - Solution 2 • Use induction: • Base case: Solution is trivial for n=1, one person is a celebrity by definition. • Assumption: Assume we leave out one person and that we can solve the problem for the remaining n-1 persons. – If there is a celebrity among the n-1 persons, we can find it – If there is no celebrity among the n-1 persons, we find this out • Induction step: • For the n'th person we have 3 possibilities: – The celebrity was among the n-1 persons – The n’th person is the celebrity – There is no celebrity in this group

Celebrity - Sol 2 - Induction step • We must solve the problem for n persons, assuming that we know to solve it for n-1 • We leave out one person. • We choose this person randomly – let it be the n’th person • Case 1: There is a celebrity among the n-1 persons, say p. To check if this is also a celebrity for the n'th person – check if know[n, p] and not know[p, n] • Case 2: There is no celebrity among the n-1 persons. In this case, either person n is the celebrity, or there is no celebrity in the group. – To check this we have to find out if know[i, n] and not know[n, i], for all i <> n.

Celebrity – Solution 2 Function Celebrity_Sol 2(n: integer) returns integer if n = 1 then return 1 else p = Celebrity_Sol 2(n-1); if p != 0 then // p is the celebrity among n-1 if( knows[n, p] and not knows[p, n] ) then return p end if forall i = 1. . n-1 if (knows[n, i] or not knows[i, n]) then return 0 // there is no celebrity end if end for return n // n is the celebrity

Celebrity – Solution 2 Analysis T(n) Function Celebrity_Sol 2(n: integer) returns integer if n = 1 then return 1 else p = Celebrity_Sol 2(n-1); if p != 0 then // p is the celebrity among n-1 if( knows[n, p] and not knows[p, n] ) then return p end if forall i = 1. . n-1 if (knows[n, i] or not knows[i, n]) then return 0 // there is no celebrity end if end for return n // n is the celebrity T(n-1) O(n)

Celebrity – Solution 2 Analysis • T(n)=T(n-1) + n • T(1)=1 • T(n)=1+2+3+ +n=n*(n+1)/2 • T(n) is O(n 2) • We have reduced a problem of size n to a problem of size n-1. We then still have to relate the n-th element to the n-1 other elements, and here this is done by a sequence which is O(n), so we get an algorithm of complexity O (n 2), which is the same as the brute force. • If we want to reduce the complexity of the algorithm to O(n), we should have T(n)=T(n-1)+c

Celebrity – Solution 3 • The key idea here is to reduce the size of the problem from n persons to n-1, but in a clever way – by eliminating someone who is a noncelebrity. • After each question, we can eliminate a person – if knows[i, j] then i cannot be a celebrity => elim i – if not knows[i, j] then j cannot be a celebrity => elim j

Celebrity - Solution 3 • Use induction: • Base case: Solution is trivial for n=1, one person is a celebrity by definition. • Assumption: Assume we leave out one person who is not a celebrity and that we can solve the problem for the remaining n-1 persons. – If there is a celebrity among the n-1 persons, we can find it – If there is no celebrity among the n-1 persons, we find this out • Induction step: • For the n'th person we have only 2 possibilities left: – The celebrity was among the n-1 persons – There is no celebrity in this group

Celebrity - Sol 3 - Induction step • We must solve the problem for n persons, assuming that we know to solve it for n-1 • We leave out one person. In order to decide which person to elim, we ask a question (to a random person i about a random person j). • The eliminated person is e=i or e=j, and we know that e is not a celebrity • Case 1: There is a celebrity among the n-1 persons that remain after eliminating e, say p. To check if this is also a celebrity for the person e – check if know[e, p] and not know[p, e] • Case 2: There is no celebrity among the n-1 persons. In this case, there is no celebrity in the group. It is no need any more to check if e is a celebrity !

Celebrity – Solution 3 Function Celebrity_Sol 3(S: Set of persons) return person if card(S) = 1 return S(1) pick i, j any persons in S if knows[i, j] then // i no celebrity elim=i else // if not knows[i, j] then j no celebrity elim=j p = Celebrity_Sol 3(S-elim) if p != 0 and knows[elim, p] and not knows[p, elim] return p else return 0 // no celebrity end if end function Celebrity_Sol 3

Celebrity – Solution 3 Analysis Function Celebrity_Sol 3(S: Set of persons) return person if card(S) = 1 return S(1) pick i, j any persons in S if knows[i, j] then // i no celebrity elim=i else // if not knows[i, j] then j no celebrity elim=j p = Celebrity_Sol 3(S-elim) if p != 0 and knows[elim, p] and not knows[p, elim] return p else return 0 // no celebrity end if end function Celebrity_Sol 3 T(n)=T(n-1)+c Algorithm is O(n)

Celebrity – Eliminate Recursivity • Designing a solution can be done more naturally in recursive terms • Having an efficient implementation requires to eliminate recursivity • Iterative Implementation: – – Start with n candidates Ask one question and eliminate one candidate Repeat until only one candidate is left It is necessary to verify if the only candidate left is the celebrity, or otherwise no celebrity is in the group

Celebrity – No Recursivity Function Celebrity_Sol 3_Non. Recursive(S: Set of persons) return person forall p in S do Push (p, Stack) while (Stack has more than 1 person) do i=Pop(Stack) j=Pop(Stack) if knows[i, j] then stays=j; else stays=i; Push(stays, Stack); end while p = Pop(Stack) // if there is a celebrity, then it is p forall i = 1. . n if (i<>p) and (knows[p, i] or not knows[i, p]) then return 0 // there is no celebrity return p // the last person not eliminated is the celeb end function Celebrity_Sol 3_Non. Recursive

Celebrity – No Rec - Analysis Function Celebrity_Sol 3_Non. Recursive(S: Set of persons) return person O(n) forall p in S do Push (p, Stack) while (Stack has more than 1 person) do i=Pop(Stack) j=Pop(Stack) if knows[i, j] then stays=j; else stays=i; Push(stays, Stack); end while p = Pop(Stack) // if there is a celebrity, then it is p forall i = 1. . n if (knows[p, i] or not knows[i, p]) then return 0 // there is no celebrity return p // the last person not eliminated is the celeb end function Celebrity_Sol 3_Non. Recursive O(n)

Celebrity - Conclusions • The size of the problem should be reduced (from n to n-1) in a clever way • This example shows that it sometimes pays off to expend some effort ( in this case – one question) to perform the reduction more effectively • Even if recursivity is used in the design phase, the implementation can be done iteratively

The Skyline Problem • Problem: Given the exact locations and heights of several rectangular buildings, having the bottoms on a fixed line, draw the skyline of these buildings, eliminating hidden lines.

The Skyline Problem • A building Bi is represented as a triplet (Li, Hi, Ri) • A skyline of a set of n buildings is a list of x coordinates and the heights connecting them • Input (1, 11, 5), (2, 6, 7), (3, 13, 9), (12, 7, 16), (14, 3, 25), (19, 18, 22), (23, 13, 29), (24, 4, 28) • Output (1, 11), (3, 13), (9, 0), (12, 7), (16, 3), (19, 18), (22, 3), (23, 13), (29, 0)

Skyline – Solution 1 • Base case: If number of buildings n=1, the skyline is the building itself • Inductive step: We assume that we know to solve the skyline for n-1 buildings, and then we add the n’th building to the skyline

Adding One Building to the Skyline • Merge. Building(building, skyline): We scan the skyline, looking at one horizontal line after the other, and adjusting when the height of the building is higher than the skyline height Worst case: Bn O(n)

Skyline – Solution 1 Analysis T(n) Algorithm Skyline_Sol 1(n: integer) returns Skyline if n = 1 then return Building[1] else sky 1 = Skyline_Sol 1(n-1); sky 2 = Merge. Building(Building[n], sky 1); return sky 2; end algorithm T(n-1) T(n)=T(n-1)+O(n) O(n 2) O(n)

Skyline – Solution 2 • Base case: If number of buildings n=1, the skyline is the building itself • Inductive step: We assume that we know to solve the skylines for n/2 buildings, and then we merge the two skylines

Merging two skylines • Merge. Skyline(sky 1, sky 2): We scan the two skylines together, from left to right, match x coordinates and adjust height where needed n 1 n 2 Worst case: O(n 1+n 2)

Skyline – Solution 2 Analysis T(n) Algorithm Skyline_Sol 2(left, right: integer) returns Skyline if left=right then return Building[left] else middle=left+(left+right)/2 sky 1 = Skyline_Sol 2(left, middle); sky 2 = Skyline_Sol 2(middle+1, right); sky 3 = Merge. Skylines(sky 1, sky 2); return sky 3; end algorithm T(n/2) O(n) T(n)=2 T(n/2)+O(n) O(n log n)

Skyline - Conclusion • When the effort of combining the subproblems cannot be reduced, it is more efficient to split into several subproblems of the same type which are of equal sizes – T(n)=T(n-1)+O(n) – T(n)=2 T(n/2)+O(n) • This technique is … O(n 2) … O(n log n) Divide and conquer

Efficiently dividing and combining • Solving a problem takes following 3 actions: – Divide the problem into a number of subproblems that are smaller instances of the same problem. – Solve the subproblems – Combine the solutions to the subproblems into the solution for the original problem. • Execution time is defined by recurrences like – T(size)=Sum (T(subproblem size)) + Combine. Time – Choosing the right subproblems sizes and the right way of combining them decides the performance !

Recurrences • T(n)=O(1), if n=1 • For n>1, we may have different recurrence relations, according to the way of splitting into subproblems and of combining them. Most common are: – T(n)=T(n-1)+O(n) … O(n 2) – T(n)=T(n-1)+O(1) … O(n) – T(n)=2 T(n/2)+O(n) … O(n log n) – T(n)=2 T(n/2)+O(1) … O(n) – T(n)=a*T(n/b) + f (n) ;

Conclusions (1) • What is Design by induction ? • An algorithm design method that uses the idea behind induction to solve problems – Instead of thinking about our algorithm as a sequence of steps to be executed, think of proving a theorem that the algorithm exists • We need to prove that this “theorem” holds for a base case, and that if it holds for “n-1” this implies that it holds for “n”

Conclusions (2) • Why/when should we use Design by induction ? – It is a well defined method for approaching a large variety of problems (“where do I start ? ”) • Just take the statement of the problem and consider it is a theorem to be proven by induction – This method is always safe: designing by induction gets us to a correct algorithm, because we designed it proving its correctness – Encourages abstract thinking vs coding => you can handle the reasoning in case of complex algorithms • A[1. . n]; A[n/4+1. . 3 n/4] • L, R; L=(R+3 L+1)/4, R=(L+3 R+1)/4 – We can make it also efficient (see next slide)

Conclusions (3) • The inductive step is always based on a reduction from problem size n to problems of size <n. How to efficiently make the reduction to smaller problems: – Sometimes one has to spend some effort to find the suitable element to remove. (see Celebrity Problem) – If the amount of work needed to combine the subproblems is not trivial, reduce by dividing in subproblems of equal size – divide and conquer (see Skyline Problem)