LucasLehmer Primality Tester Team W4 Nathan Stohs W

  • Slides: 50
Download presentation
Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W 4 -1 Brian Johnson W 4

Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W 4 -1 Brian Johnson W 4 -2 Joe Hurley W 4 -3 Marques Johnson W 4 -4 Design Manager: Prateek Goenka 1

Ancient Greeks • 300 BC Euclid’s Proof • Proved that were an infinite number

Ancient Greeks • 300 BC Euclid’s Proof • Proved that were an infinite number of Prime numbers that were irregularly spaced 2

How to find Prime Numbers • The method used for smaller numbers is called

How to find Prime Numbers • The method used for smaller numbers is called Sieve of Eratosthenes from 240 BC • Trial Division is another method for smaller numbers 3

43 rd Known Mersenne Prime Found!! • December 2005 • Dr. Curtis Cooper and

43 rd Known Mersenne Prime Found!! • December 2005 • Dr. Curtis Cooper and Dr. Steven Boone • Professors at Central Missouri State University • 230, 402, 457 -1 4

rank prime digits who when reference 1 230402457 -1 9152052 G 9 2005 Mersenne

rank prime digits who when reference 1 230402457 -1 9152052 G 9 2005 Mersenne 43 2 225964951 -1 7816230 G 8 2005 Mersenne 42 3 224036583 -1 7235733 G 7 2004 Mersenne 41 4 220996011 -1 6320430 G 6 2003 Mersenne 40 5 213466917 -1 4053946 G 5 2001 Mersenne 39 6 27653. 29167433+1 2759677 SB 8 2005 7 28433. 27830457+1 2357207 SB 7 2004 8 26972593 -1 2098960 G 4 1999 9 5359. 25054502+1 1521561 SB 6 2003 10 4847. 23321063+1 Mersenne 38 999744 SB 9 2005 5

Prime Number Competitions • Electronic Frontier Foundation • $50, 000 to the first individual

Prime Number Competitions • Electronic Frontier Foundation • $50, 000 to the first individual or group who discovers a prime number with at least 1, 000 decimal digits (awarded Apr. 6, 2000) • $100, 000 to the first individual or group who discovers a prime number with at least 10, 000 decimal digits • $150, 000 to the first individual or group who discovers a prime number with at least 100, 000 decimal digits • $250, 000 to the first individual or group who discovers a prime number with at least 1, 000, 000 decimal digits 6

Mersenne Prime Algorithm • For P > 2 • 2 P-1 is prime if

Mersenne Prime Algorithm • For P > 2 • 2 P-1 is prime if and only if Sp-2 is zero in this sequence: • S 0 = 4 • SN = (SN-12 - 2) mod (2 P-1) 7

Example to Show 27 - 1 is Prime • • 27 – 1 =

Example to Show 27 - 1 is Prime • • 27 – 1 = 127 S 0 = 4 S 1 = (4 * 4 - 2) mod 127 = 14 S 2 = (14 * 14 - 2) mod 127 = 67 S 3 = (67 * 67 - 2) mod 127 = 42 S 4 = (42 * 42 - 2) mod 127 = 111 S 5 = (111 * 111 - 2) mod 127 = 0 8

Algorithmic description We knew the computations needed, but how to translate that to gates?

Algorithmic description We knew the computations needed, but how to translate that to gates? Computations needed: -Squaring (not a problem…) -Add/Subtract (not a problem…) -Modulo (2^n – 1) multiplication (? ) 9

Mechanisms behind the math • If done with brute force, modulo 2^n-1 could have

Mechanisms behind the math • If done with brute force, modulo 2^n-1 could have been ugly. – Would need to square and find the remainder via division. • Luckily, for that specific computation, math is on our side, the 2^n-1 constraint saves us from division, as will be seen. • A quick search on www. ieee. com produced inspiration. • Taken from “Efficient VLSI Implementation of Modulo (2^n +- 1) Addition and Multiplication” Reto Zimmermann Swiss Federal Institute of Technology (ETH) 10

Useful Math: Multiplication Just like any other multiplication, a modulo multiplication can be computed

Useful Math: Multiplication Just like any other multiplication, a modulo multiplication can be computed by (modulo) summing the partial products. So modulo multiplication is multiplication using a modulo adder. 11

Useful Math: The Modulo Adder The more logic driven math that is the basis

Useful Math: The Modulo Adder The more logic driven math that is the basis of our modulo adder. 12

Last Bits: Modulo Reduction At various points, such as when finding the partial product,

Last Bits: Modulo Reduction At various points, such as when finding the partial product, the result has to be reduced. There is a nifty way to do that as well. 13

Block Diagram 16 clk start Mod Calc P 1 16 4 FSM 1 Mod

Block Diagram 16 clk start Mod Calc P 1 16 4 FSM 1 Mod Multiply 2 clk 16 2 done 16 Subtract 2 2 16 clk Register r 4 16 1 Count clk Compare Out 1 14

Mod Multiply Block Diagram from register 16 2 Counter 4 16 P FSM clk

Mod Multiply Block Diagram from register 16 2 Counter 4 16 P FSM clk Next Partial Product 16 16 Mod add 16 Register 2 2 p-1 to sub 2 FSM clk 16 15

Block Diagram 16 clk start Mod Calc P 1 16 4 FSM 1 Mod

Block Diagram 16 clk start Mod Calc P 1 16 4 FSM 1 Mod Multiply 2 clk 16 2 done 16 Subtract 2 2 16 clk Register r 2 16 1 Count clk Compare Out 1 16

The Process So far: Design Process - Found Mathematical Means (core algorithm) - Found

The Process So far: Design Process - Found Mathematical Means (core algorithm) - Found Computational Means (modulo multiplier, adder) From the above, a high level C program was written in a manner that would easily translate to verilog and gates, or at least more standard operations int mod_square_minus(int value, int p, int offset) { int acc, i; int mod = (1 << p) - 1; for(acc=offset, i=0; i<(sizeof(int)*8 -1); i++) { int a = (value >> i) & 1; int temp; if (a) { if (i-p > 0) temp = value << (i-p); else temp = value >> (p-i); acc = acc + temp + ((value << i) & ((1 << p) - 1)); } if (acc >= mod) acc = acc - mod; } return acc; } This easily translated into behavorial verilog, and readily turned into a gatelevel implementation. Essentially it was written in a more low-level manner. 17

Design Process The rest of the design can simply be thought of as a

Design Process The rest of the design can simply be thought of as a wrapper for the modulo multiplier. The following slides contain Verilog code that was directly taken from the C code below. module mod_mult(out, itr. Count, x, y, mod, p, reset, en, clk); input [15: 0] x, y, mod, p; output [15: 0] out; input reset, en, clk; wire [15: 0] pp, ma 0, temp; output [3: 0] itr. Count; Top level of multiplier counter mycount(itr. Count, reset, en, clk); partial_product ppg(pp, x, y, itr. Count, mod, p); mod_add mod. Adder(out, pp, temp, mod); dff_16_lp partial(clk, out, temp, reset, en); endmodule 18

module partial_product(out, x, y, i, mod, p); output [15: 0] out; input [15: 0]

module partial_product(out, x, y, i, mod, p); output [15: 0] out; input [15: 0] x, y, mod, p; input [3: 0] i; Partial Product Unit w/ modulo reduction wire [15: 0] diff 1, diff 2, added, result, corrected, final; wire [15: 0] high, low, shifted, toadd; wire cout 1, cout 2, ithbith, toobig; sub_16 difference 1(diff 1, cout 1, {12'b 0, i}, p); sub_16 difference 2(diff 2, cout 2, p, {12'b 0, i}); shift_left shift. L(high, y, diff 1[3: 0]); shift_right shift. R(low, y, diff 2[3: 0]); mux 16 choose(high, low, shifted, cout 1); shift_left shift. L 2(toadd, y, i); and 16 bigand(added, toadd, mod); fulladder_16 addhighlow(. out(result), . xin(added), . yin(shifted), . cin({1'b 0}), . cout(nowhere)); sub_16 correct(. out(corrected), . cout(toobig), . xin(mod), . yin(result)); mux 16 correction. Mux(. out(final), . high(corrected), . low(result), . sel(toobig)); shift_right ibit({15'b 0, ithbit}, x, i); select 16 checkfor 0(. out(out), . x(result), . sel(ithbit)); endmodule 19

Modulo Adder module mod_add(out, x, y, mod); input [15: 0] x, y, mod; output

Modulo Adder module mod_add(out, x, y, mod); input [15: 0] x, y, mod; output [15: 0] out; wire cout, is. Double, cin; wire [15: 0] plus, lowbits, done, mod_bar, check; fulladder_16 add(. out(plus), . xin(x), . yin(y), . cin(cin), . cout()); invert_16 inverter(mod_bar, mod); and 16 hihnbits(check, plus, mod_bar); and 16 lownbits(done, plus, mod); or 8 (cin, check[0], check[1], check[2], check[3], check[4], check[5], check[6], check[7], check[8], check[9], check[10], check[11], check[12], check[13], check[14], check[15]); compare_16 checkfordouble(is. Double, done, 16'b 1111_1111_1111); mux 16 fixdouble(. out(out), . high(16'b 0), . low(done), . sel(is. Double)); endmodule 20

Final Design Process Notes • Lessons learned: Never tweak the schematics without retesting the

Final Design Process Notes • Lessons learned: Never tweak the schematics without retesting the verilog first. • Considering total time spent during this phase, roughly half was on the “core” and the FSM, the rest on the “wrapper”. 21

Road to verification : C 2 Examples of the high-level C implementations: Tyrion: ~/Desktop/15525

Road to verification : C 2 Examples of the high-level C implementations: Tyrion: ~/Desktop/15525 nstohs$. /prime 4 7 round 1: (4 * 4 - 2) mod 127 = 14 round 2: (14 * 14 - 2) mod 127 = 67 round 3: (67 * 67 - 2) mod 127 = 42 round 4: (42 * 42 - 2) mod 127 = 111 round 5: (111 * 111 - 2) mod 127 = 0 2^7 -1 is prime Tyrion: ~/Desktop/15525 nstohs$. /prime 4 11 round 1: (4 * 4 - 2) mod 2047 = 14 round 2: (14 * 14 - 2) mod 2047 = 194 round 3: (194 * 194 - 2) mod 2047 = 788 round 4: (788 * 788 - 2) mod 2047 = 701 round 5: (701 * 701 - 2) mod 2047 = 119 round 6: (119 * 119 - 2) mod 2047 = 1877 round 7: (1877 * 1877 - 2) mod 2047 = 240 round 8: (240 * 240 - 2) mod 2047 = 282 round 9: (282 * 282 - 2) mod 2047 = 1736 2^11 -1 is not prime 22

Road to verification: Verilog Samples of Verilog Verification output: Tests were either specific tests

Road to verification: Verilog Samples of Verilog Verification output: Tests were either specific tests on important units such as Partial_Product Partial Product Unit p = 7 380 pp. Out= 56, x= 14, y= 14, i= 2, mod= 127, p= 7 400 pp. Out= 112, x= 14, y= 14, i= 3, mod= 127, p= 7 420 pp. Out= 0, x= 14, y= 14, i= 4, mod= 127, p= 7 440 pp. Out= 0, x= 14, y= 14, i= 5, mod= 127, p= 7 Top Level p = 7 itr. Out= x itr. Out= 4 itr. Out= 14 itr. Out= 67 itr. Out= 42 itr. Out= 111 itr. Out= 0 Top Level p = 11 itr. Out= x itr. Out= 4 itr. Out= 194 itr. Out= 788 itr. Out= 701 itr. Out= 119 itr. Out= 1877 … …our top level tests. Note that these are the same results generated from the C code 23

Road to verification: Schematic I Schematic Test of our modular adder. 128 + 68

Road to verification: Schematic I Schematic Test of our modular adder. 128 + 68 Mod 127 = 69 24

Road to verification: Schematic II Plot of the top level output after a single

Road to verification: Schematic II Plot of the top level output after a single iteration, p=7 Output after a single iteration is 14, the expected value. 25

Road to verification: Schematic III The simulation outputs after a full run, showing the

Road to verification: Schematic III The simulation outputs after a full run, showing the results of all iterations. Simulations start taking a long time. More on that later. 26

Road to verification: Intermission Disk Space required for a full-length schematic test of p=7

Road to verification: Intermission Disk Space required for a full-length schematic test of p=7 : 6 GB Time required for a full-length schematic test of p=7 : 4 hours Disk Space required for a full-length extracted test of p=7 : more Time required for a full-length extracted test of p=7 : longer Disk Space required for a full-length extracted. RC test of p=7 : 1 i. Pod Time required for a full-length extracted. RC test of p=7 : T_T Simulations become very demanding and lengthy due to tests needing to be “deep” to be useful. To meet such demands, be sure to use Genuine AMD© CPUs. 27

Road to verification: Layout I 3 words: “the net-lists match” Of course, there is

Road to verification: Layout I 3 words: “the net-lists match” Of course, there is far more to be concerned about. Due to simulator issues, layout simulations were delayed on some major modules. Partial Product Sims In Progress (I Hope) 28

Road to verification: Layout II Top Level layout Sims in Progress 29

Road to verification: Layout II Top Level layout Sims in Progress 29

Road to verification: Timing Pathmill was useful to help us gauge our critical path,

Road to verification: Timing Pathmill was useful to help us gauge our critical path, which is one cycle through our modulo multiplier. When run on the top level, a critical path of 12. 703 ns was found. This was in the ballpark relative to our research. Layout Timing Sims in progress 30

Issues • extracted. RC of partial_product module • Registers switch • Switching from parallel

Issues • extracted. RC of partial_product module • Registers switch • Switching from parallel calculations to series – Transistor count vs. clock cycles • Syncing up design between people – Transferring files – Different design styles • LONG simulation times • Floorplanning – Too much emphasis on aspect ratios and not enough on wiring – Couldn’t decide on one set floorplan 31

Floorplan v 1. 0 Mod Multiplier Prime Logic Memory Mod Adder FSM 32

Floorplan v 1. 0 Mod Multiplier Prime Logic Memory Mod Adder FSM 32

Floorplan v 2. 0 33

Floorplan v 2. 0 33

Floorplan v 3. 0 34

Floorplan v 3. 0 34

Floorplan v 4. 0 35

Floorplan v 4. 0 35

Floorplan v 5. 0 36

Floorplan v 5. 0 36

Final Floorplan 37

Final Floorplan 37

Pin Specs Pin Type # of Pins Vdd! In/Out 1 Gnd! In/Out 1 p<0:

Pin Specs Pin Type # of Pins Vdd! In/Out 1 Gnd! In/Out 1 p<0: 15> In 16 clk In 1 start In 1 Done Out 1 out Out 1 Total - 22 38

Initial Part Specs Module Transistor Count Area (µm²) Transistor Density FSM 300 900 .

Initial Part Specs Module Transistor Count Area (µm²) Transistor Density FSM 300 900 . 33 mod_p 2, 440 7, 000 . 35 mod_add 1, 282 9, 000 . 14 partial_product 8, 676 65, 000 . 13 count 1, 656 6, 000 . 27 sub_16 704 3, 500 . 20 Registers 1, 848 6, 000 . 30 compare 36 300 . 12 Total 16, 942 97, 700 . 17 39

Final Part Specs Module Transistor Count Area (µm²) Transistor Density Aspect Ratio FSM 152

Final Part Specs Module Transistor Count Area (µm²) Transistor Density Aspect Ratio FSM 152 1, 200 . 13 2. 45 mod_p 1, 280 8, 603 . 15 0. 79 mod_add 1, 168 5, 603 . 21 2. 40 partial_product 7, 520 54, 680 . 14 1. 16 count 1, 424 8, 701 . 16 6. 88 sub_16 576 2, 934 . 20 4. 49 Registers 896 6, 028 . 15 4. 76 compare 56 201 . 28 4. 41 Total 13, 702 86, 621 . 16 1. 01 40

Chip Specs • • • Transistor Count: 13, 702 Size: 296. 51µm x 292.

Chip Specs • • • Transistor Count: 13, 702 Size: 296. 51µm x 292. 13µm Area: 86, 621µm² Aspect Ratio: 1. 01: 1 Density: 0. 16 transistors/µm² 41

Final Floorplan 42

Final Floorplan 42

Final Floorplan 43

Final Floorplan 43

Poly Layer Density: 7. 14% 44

Poly Layer Density: 7. 14% 44

Active Layer Density: 8. 76% 45

Active Layer Density: 8. 76% 45

Metal 1 Layer Density: 23. 86% 46

Metal 1 Layer Density: 23. 86% 46

Metal 2 Layer Density: 19. 97% 47

Metal 2 Layer Density: 19. 97% 47

Metal 3 Layer Density: 11. 30% 48

Metal 3 Layer Density: 11. 30% 48

Metal 4 Layer Density: 10. 34% 49

Metal 4 Layer Density: 10. 34% 49

Conclusions • Plan for buffers – Can’t put them in after the fact •

Conclusions • Plan for buffers – Can’t put them in after the fact • Your design will change dramatically from start to finish so be flexible • Communication is key • Do layout in parallel 50