Sieve of Eratosthenes Nov 20 2012 copyright 2012

Sieve of Eratosthenes • The Sieve of Eratosthenes is an algorithm to find the

Sieve of Eratosthenes 2 3 4 5 6 7 8 9 10 11 12

Sieve of Eratosthenes Next Prime = 3 2 3 4 5 6 7 8

Sieve of Eratosthenes Next Prime = 7 2 3 4 5 6 7 8

$Sequential Sieve of Eratosthenes Algorithm Sieve_Eratosthenes() { boolean marked [n] = { true, .$

Sequential Complexity • The outermost loop will iterate at most √n times • The

Parallelizing Sieve of Eratosthenes • An obvious approach is to parallelize the loop marking

Parallelizing Sieve of Eratosthenes • The difficulty is in selecting the next prime •

Parallelizing Sieve of Eratosthenes while (prime * prime < n) { // Parallelize this

• There is a fair bit of communication within each loop • Furthermore,

Parallelizing Sieve of Eratosthenes • A better approach is to partition the array instead

Sieve of Eratosthenes √n primes will be used. These need to be within the

Parallelizing Sieve of Eratosthenes • First we need to make sure the master process

Parallelizing Sieve of Eratosthenes low_value = (rank*n-1)/p; high_value = ((rank+1)*n-1)/p; prime = 2; do

Parallelizing Sieve of Eratosthenes if (rank == 0) { Master determines next prime and

Parallel Complexity • The outermost loop will iterate √n times • The 1 st

Improving the Parallel Algorithm • Still we have a broadcast within the outermost loop

Improving the Parallel Algorithm • The number of primes used in the Sieve of

Complexity • The complexity is essentially the same for most processors, except: – Communication

Complexity • Final Complexity: Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC

Questions Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson,

Discussion Question • Question: Would it be better to implement this algorithm in shared-memory

Slides: 23

Download presentation

Sieve of Eratosthenes • The Sieve of Eratosthenes is an algorithm to find the prime numbers between 2 and n • Start with an array of booleans from 2 to n initially set to all true • For each known prime starting with 2, mark all the multiples (composites) of that prime • Stop when the next prime > √n • What is left unmark are the primes Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 2

Sieve of Eratosthenes 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next Prime = 2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 3

Sieve of Eratosthenes Next Prime = 3 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next Prime = 5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 4

Sieve of Eratosthenes Next Prime = 7 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next Prime = 11 112=121 > 65 so stop Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 5

$Sequential Sieve of Eratosthenes Algorithm Sieve_Eratosthenes() { boolean marked [n] = { true, .$

Sequential Sieve of Eratosthenes Algorithm Sieve_Eratosthenes() { boolean marked [n] = { true, . . . }; prime = 2; while (prime * prime < n) { // or prime < sqrt(n) composite = prime * prime; // start with prime^2 while (composite < n) { marked[composite] = false; composite = composite + prime; // multiples of prime } do { prime++; } while (marked[prime]); // find next prime } Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 6

Sequential Complexity • The outermost loop will iterate at most √n times • The 1 st inner loop could iterate up to n/2 times • The 2 nd loop will iterate √n times total (for all iterations of the outermost loop) • Complexity = Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 7

Parallelizing Sieve of Eratosthenes • An obvious approach is to parallelize the loop marking multiples of prime composite = prime * prime; while (composite < n) { marked[composite] = false; composite = composite + prime; } • We can rewrite this as a for loop: for (composite=prime*prime; composite < n; composite += prime) { marked[composite] = false; Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 8

Parallelizing Sieve of Eratosthenes • The difficulty is in selecting the next prime • We have to a reduction and broadcast of the marked array so all processors can continue updating the array Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 9

Parallelizing Sieve of Eratosthenes while (prime * prime < n) { // Parallelize this loop for (composite=prime*prime; composite < n; composite += prime) { marked[composite] = false; MPI_Reduce (marked, . . . , MPI_LAND, . . . ); MPI_Bcast (marked, . . . ); do { prime++; } while (marked[prime]); } Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 10

• There is a fair bit of communication within each loop • Furthermore, the size of the array can be large (n) Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 11

Parallelizing Sieve of Eratosthenes • A better approach is to partition the array instead of the loop • With p processors, each processor is responsible for elements of the array • It is the master process’ responsibility to determine the next prime • This need to be broadcast, but we can eliminate the reduction • Also, only a single integer needs to be broadcast instead of an array Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 12

Sieve of Eratosthenes √n primes will be used. These need to be within the master process’ region. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 p 0 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 p 1 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 p 2 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 p 3 Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 13

Parallelizing Sieve of Eratosthenes • First we need to make sure the master process will compute √n values • The master process needs to be able to determine all values used for prime proc 0_size = (n-1)/p; if (2 + proc 0_size < (int) sqrt((double) n)) { if (rank == 0) printf (“Too many processorsn”); MPI_Finalize(); return 1; } Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 14

Parallelizing Sieve of Eratosthenes low_value = (rank*n-1)/p; high_value = ((rank+1)*n-1)/p; prime = 2; do { Partition the array: each processor is responsible for values from low to high Everyone starts at 2 Compute the first multiple of prime greater to or equal to low_value if (low_value % prime == 0) first = low_value; else first = low_value + prime – (low_value%prime); Mark multiple of prime in each processor’s range for (i=first + prime; i < high_value; i += prime) marked[i] = false; Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 15

Parallelizing Sieve of Eratosthenes if (rank == 0) { Master determines next prime and broadcasts it. do { prime++; } while (marked[prime]); } MPI_Bcast (prime, . . . ); } while (prime * prime < n); No reduction is needed until we finish the computation MPI_Gather(marked, . . . ); Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 16

Parallel Complexity • The outermost loop will iterate √n times • The 1 st inner loop could iterate up to n/2 p times with p processors • The 2 nd loop will iterate √n times total (for all iterations of the outermost loop) • The broadcast take log(p) time • Complexity = Computation Nov 20, 2012 Communication © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 17

Improving the Parallel Algorithm • Still we have a broadcast within the outermost loop • How can we eliminate that? • Can we have all processors determine what the next prime is? Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 18

Improving the Parallel Algorithm • The number of primes used in the Sieve of Eratosthenes is √n • All processors compute the primes from 2 … √n • Now all processors have their own private copy of the primes used • We can eliminate the broadcast • As well as the requirement that the master process’ section be at least √n Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 19

Complexity • The complexity is essentially the same for most processors, except: – Communication is eliminated until the end – There is added complexity to compute the first √n primes sequentially • Complexity to compute the first √n primes: Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 20

Discussion Question • Question: Would it be better to implement this algorithm in shared-memory (using Open. MP) than distributed-memory (using MPI)? Nov 20, 2012 © copyright 2012, Clayton S. Ferner, UNC Wilmington Barry Wilkinson, UNC Charlotte 23