LucasLehmer Primality Tester Team W4 Nathan Stohs W
- Slides: 50
Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W 4 -1 Brian Johnson W 4 -2 Joe Hurley W 4 -3 Marques Johnson W 4 -4 Design Manager: Prateek Goenka 1
Ancient Greeks • 300 BC Euclid’s Proof • Proved that were an infinite number of Prime numbers that were irregularly spaced 2
How to find Prime Numbers • The method used for smaller numbers is called Sieve of Eratosthenes from 240 BC • Trial Division is another method for smaller numbers 3
43 rd Known Mersenne Prime Found!! • December 2005 • Dr. Curtis Cooper and Dr. Steven Boone • Professors at Central Missouri State University • 230, 402, 457 -1 4
rank prime digits who when reference 1 230402457 -1 9152052 G 9 2005 Mersenne 43 2 225964951 -1 7816230 G 8 2005 Mersenne 42 3 224036583 -1 7235733 G 7 2004 Mersenne 41 4 220996011 -1 6320430 G 6 2003 Mersenne 40 5 213466917 -1 4053946 G 5 2001 Mersenne 39 6 27653. 29167433+1 2759677 SB 8 2005 7 28433. 27830457+1 2357207 SB 7 2004 8 26972593 -1 2098960 G 4 1999 9 5359. 25054502+1 1521561 SB 6 2003 10 4847. 23321063+1 Mersenne 38 999744 SB 9 2005 5
Prime Number Competitions • Electronic Frontier Foundation • $50, 000 to the first individual or group who discovers a prime number with at least 1, 000 decimal digits (awarded Apr. 6, 2000) • $100, 000 to the first individual or group who discovers a prime number with at least 10, 000 decimal digits • $150, 000 to the first individual or group who discovers a prime number with at least 100, 000 decimal digits • $250, 000 to the first individual or group who discovers a prime number with at least 1, 000, 000 decimal digits 6
Mersenne Prime Algorithm • For P > 2 • 2 P-1 is prime if and only if Sp-2 is zero in this sequence: • S 0 = 4 • SN = (SN-12 - 2) mod (2 P-1) 7
Example to Show 27 - 1 is Prime • • 27 – 1 = 127 S 0 = 4 S 1 = (4 * 4 - 2) mod 127 = 14 S 2 = (14 * 14 - 2) mod 127 = 67 S 3 = (67 * 67 - 2) mod 127 = 42 S 4 = (42 * 42 - 2) mod 127 = 111 S 5 = (111 * 111 - 2) mod 127 = 0 8
Algorithmic description We knew the computations needed, but how to translate that to gates? Computations needed: -Squaring (not a problem…) -Add/Subtract (not a problem…) -Modulo (2^n – 1) multiplication (? ) 9
Mechanisms behind the math • If done with brute force, modulo 2^n-1 could have been ugly. – Would need to square and find the remainder via division. • Luckily, for that specific computation, math is on our side, the 2^n-1 constraint saves us from division, as will be seen. • A quick search on www. ieee. com produced inspiration. • Taken from “Efficient VLSI Implementation of Modulo (2^n +- 1) Addition and Multiplication” Reto Zimmermann Swiss Federal Institute of Technology (ETH) 10
Useful Math: Multiplication Just like any other multiplication, a modulo multiplication can be computed by (modulo) summing the partial products. So modulo multiplication is multiplication using a modulo adder. 11
Useful Math: The Modulo Adder The more logic driven math that is the basis of our modulo adder. 12
Last Bits: Modulo Reduction At various points, such as when finding the partial product, the result has to be reduced. There is a nifty way to do that as well. 13
Block Diagram 16 clk start Mod Calc P 1 16 4 FSM 1 Mod Multiply 2 clk 16 2 done 16 Subtract 2 2 16 clk Register r 4 16 1 Count clk Compare Out 1 14
Mod Multiply Block Diagram from register 16 2 Counter 4 16 P FSM clk Next Partial Product 16 16 Mod add 16 Register 2 2 p-1 to sub 2 FSM clk 16 15
Block Diagram 16 clk start Mod Calc P 1 16 4 FSM 1 Mod Multiply 2 clk 16 2 done 16 Subtract 2 2 16 clk Register r 2 16 1 Count clk Compare Out 1 16
The Process So far: Design Process - Found Mathematical Means (core algorithm) - Found Computational Means (modulo multiplier, adder) From the above, a high level C program was written in a manner that would easily translate to verilog and gates, or at least more standard operations int mod_square_minus(int value, int p, int offset) { int acc, i; int mod = (1 << p) - 1; for(acc=offset, i=0; i<(sizeof(int)*8 -1); i++) { int a = (value >> i) & 1; int temp; if (a) { if (i-p > 0) temp = value << (i-p); else temp = value >> (p-i); acc = acc + temp + ((value << i) & ((1 << p) - 1)); } if (acc >= mod) acc = acc - mod; } return acc; } This easily translated into behavorial verilog, and readily turned into a gatelevel implementation. Essentially it was written in a more low-level manner. 17
Design Process The rest of the design can simply be thought of as a wrapper for the modulo multiplier. The following slides contain Verilog code that was directly taken from the C code below. module mod_mult(out, itr. Count, x, y, mod, p, reset, en, clk); input [15: 0] x, y, mod, p; output [15: 0] out; input reset, en, clk; wire [15: 0] pp, ma 0, temp; output [3: 0] itr. Count; Top level of multiplier counter mycount(itr. Count, reset, en, clk); partial_product ppg(pp, x, y, itr. Count, mod, p); mod_add mod. Adder(out, pp, temp, mod); dff_16_lp partial(clk, out, temp, reset, en); endmodule 18
module partial_product(out, x, y, i, mod, p); output [15: 0] out; input [15: 0] x, y, mod, p; input [3: 0] i; Partial Product Unit w/ modulo reduction wire [15: 0] diff 1, diff 2, added, result, corrected, final; wire [15: 0] high, low, shifted, toadd; wire cout 1, cout 2, ithbith, toobig; sub_16 difference 1(diff 1, cout 1, {12'b 0, i}, p); sub_16 difference 2(diff 2, cout 2, p, {12'b 0, i}); shift_left shift. L(high, y, diff 1[3: 0]); shift_right shift. R(low, y, diff 2[3: 0]); mux 16 choose(high, low, shifted, cout 1); shift_left shift. L 2(toadd, y, i); and 16 bigand(added, toadd, mod); fulladder_16 addhighlow(. out(result), . xin(added), . yin(shifted), . cin({1'b 0}), . cout(nowhere)); sub_16 correct(. out(corrected), . cout(toobig), . xin(mod), . yin(result)); mux 16 correction. Mux(. out(final), . high(corrected), . low(result), . sel(toobig)); shift_right ibit({15'b 0, ithbit}, x, i); select 16 checkfor 0(. out(out), . x(result), . sel(ithbit)); endmodule 19
Modulo Adder module mod_add(out, x, y, mod); input [15: 0] x, y, mod; output [15: 0] out; wire cout, is. Double, cin; wire [15: 0] plus, lowbits, done, mod_bar, check; fulladder_16 add(. out(plus), . xin(x), . yin(y), . cin(cin), . cout()); invert_16 inverter(mod_bar, mod); and 16 hihnbits(check, plus, mod_bar); and 16 lownbits(done, plus, mod); or 8 (cin, check[0], check[1], check[2], check[3], check[4], check[5], check[6], check[7], check[8], check[9], check[10], check[11], check[12], check[13], check[14], check[15]); compare_16 checkfordouble(is. Double, done, 16'b 1111_1111_1111); mux 16 fixdouble(. out(out), . high(16'b 0), . low(done), . sel(is. Double)); endmodule 20
Final Design Process Notes • Lessons learned: Never tweak the schematics without retesting the verilog first. • Considering total time spent during this phase, roughly half was on the “core” and the FSM, the rest on the “wrapper”. 21
Road to verification : C 2 Examples of the high-level C implementations: Tyrion: ~/Desktop/15525 nstohs$. /prime 4 7 round 1: (4 * 4 - 2) mod 127 = 14 round 2: (14 * 14 - 2) mod 127 = 67 round 3: (67 * 67 - 2) mod 127 = 42 round 4: (42 * 42 - 2) mod 127 = 111 round 5: (111 * 111 - 2) mod 127 = 0 2^7 -1 is prime Tyrion: ~/Desktop/15525 nstohs$. /prime 4 11 round 1: (4 * 4 - 2) mod 2047 = 14 round 2: (14 * 14 - 2) mod 2047 = 194 round 3: (194 * 194 - 2) mod 2047 = 788 round 4: (788 * 788 - 2) mod 2047 = 701 round 5: (701 * 701 - 2) mod 2047 = 119 round 6: (119 * 119 - 2) mod 2047 = 1877 round 7: (1877 * 1877 - 2) mod 2047 = 240 round 8: (240 * 240 - 2) mod 2047 = 282 round 9: (282 * 282 - 2) mod 2047 = 1736 2^11 -1 is not prime 22
Road to verification: Verilog Samples of Verilog Verification output: Tests were either specific tests on important units such as Partial_Product Partial Product Unit p = 7 380 pp. Out= 56, x= 14, y= 14, i= 2, mod= 127, p= 7 400 pp. Out= 112, x= 14, y= 14, i= 3, mod= 127, p= 7 420 pp. Out= 0, x= 14, y= 14, i= 4, mod= 127, p= 7 440 pp. Out= 0, x= 14, y= 14, i= 5, mod= 127, p= 7 Top Level p = 7 itr. Out= x itr. Out= 4 itr. Out= 14 itr. Out= 67 itr. Out= 42 itr. Out= 111 itr. Out= 0 Top Level p = 11 itr. Out= x itr. Out= 4 itr. Out= 194 itr. Out= 788 itr. Out= 701 itr. Out= 119 itr. Out= 1877 … …our top level tests. Note that these are the same results generated from the C code 23
Road to verification: Schematic I Schematic Test of our modular adder. 128 + 68 Mod 127 = 69 24
Road to verification: Schematic II Plot of the top level output after a single iteration, p=7 Output after a single iteration is 14, the expected value. 25
Road to verification: Schematic III The simulation outputs after a full run, showing the results of all iterations. Simulations start taking a long time. More on that later. 26
Road to verification: Intermission Disk Space required for a full-length schematic test of p=7 : 6 GB Time required for a full-length schematic test of p=7 : 4 hours Disk Space required for a full-length extracted test of p=7 : more Time required for a full-length extracted test of p=7 : longer Disk Space required for a full-length extracted. RC test of p=7 : 1 i. Pod Time required for a full-length extracted. RC test of p=7 : T_T Simulations become very demanding and lengthy due to tests needing to be “deep” to be useful. To meet such demands, be sure to use Genuine AMD© CPUs. 27
Road to verification: Layout I 3 words: “the net-lists match” Of course, there is far more to be concerned about. Due to simulator issues, layout simulations were delayed on some major modules. Partial Product Sims In Progress (I Hope) 28
Road to verification: Layout II Top Level layout Sims in Progress 29
Road to verification: Timing Pathmill was useful to help us gauge our critical path, which is one cycle through our modulo multiplier. When run on the top level, a critical path of 12. 703 ns was found. This was in the ballpark relative to our research. Layout Timing Sims in progress 30
Issues • extracted. RC of partial_product module • Registers switch • Switching from parallel calculations to series – Transistor count vs. clock cycles • Syncing up design between people – Transferring files – Different design styles • LONG simulation times • Floorplanning – Too much emphasis on aspect ratios and not enough on wiring – Couldn’t decide on one set floorplan 31
Floorplan v 1. 0 Mod Multiplier Prime Logic Memory Mod Adder FSM 32
Floorplan v 2. 0 33
Floorplan v 3. 0 34
Floorplan v 4. 0 35
Floorplan v 5. 0 36
Final Floorplan 37
Pin Specs Pin Type # of Pins Vdd! In/Out 1 Gnd! In/Out 1 p<0: 15> In 16 clk In 1 start In 1 Done Out 1 out Out 1 Total - 22 38
Initial Part Specs Module Transistor Count Area (µm²) Transistor Density FSM 300 900 . 33 mod_p 2, 440 7, 000 . 35 mod_add 1, 282 9, 000 . 14 partial_product 8, 676 65, 000 . 13 count 1, 656 6, 000 . 27 sub_16 704 3, 500 . 20 Registers 1, 848 6, 000 . 30 compare 36 300 . 12 Total 16, 942 97, 700 . 17 39
Final Part Specs Module Transistor Count Area (µm²) Transistor Density Aspect Ratio FSM 152 1, 200 . 13 2. 45 mod_p 1, 280 8, 603 . 15 0. 79 mod_add 1, 168 5, 603 . 21 2. 40 partial_product 7, 520 54, 680 . 14 1. 16 count 1, 424 8, 701 . 16 6. 88 sub_16 576 2, 934 . 20 4. 49 Registers 896 6, 028 . 15 4. 76 compare 56 201 . 28 4. 41 Total 13, 702 86, 621 . 16 1. 01 40
Chip Specs • • • Transistor Count: 13, 702 Size: 296. 51µm x 292. 13µm Area: 86, 621µm² Aspect Ratio: 1. 01: 1 Density: 0. 16 transistors/µm² 41
Final Floorplan 42
Final Floorplan 43
Poly Layer Density: 7. 14% 44
Active Layer Density: 8. 76% 45
Metal 1 Layer Density: 23. 86% 46
Metal 2 Layer Density: 19. 97% 47
Metal 3 Layer Density: 11. 30% 48
Metal 4 Layer Density: 10. 34% 49
Conclusions • Plan for buffers – Can’t put them in after the fact • Your design will change dramatically from start to finish so be flexible • Communication is key • Do layout in parallel 50
- Team spirit becomes team infatuation
- The white team cheers for the blue team, just like
- Going native project management
- Profibus tester fluke
- Sophos tester tool
- Lab torque tester
- Roughness tester
- Tester canal wifi
- Fluke dtx cable tester
- Process of finding errors in software code
- Prototype hardness tester
- Pengukuran tahanan isolasi digunakan untuk
- Dhs trusted tester certification
- Pitsco bridge tester
- Schema tester
- Ohd quantifit
- Tester
- Tangix tester pro
- Dissolution bath calibration
- Bbc code tester
- Alt text tester
- Micro tensile tester
- Analyseret
- Generator rotor testing methods
- Denver electric vehicle council
- Aashto t 310
- Hommel tester t1000 manual
- Modular formation dynamics tester
- Hoof tester
- Ram tester hardware
- Johnny sinze
- Forti tester
- Volt ampere tester
- British pendulum skid resistance tester
- Compozitia laptelui
- Jenike shear tester
- Dyna pull off tester
- Konduktor isolator semikonduktor
- Beep tester
- Why is led or compass needle used in tester sometime
- Public static void main string args
- Ethernet switch tester
- Bite 3 battery tester
- Multimeter settings
- Budowa zasilacza komputerowego
- Wireline formation tester
- Ebnf syntax diagram
- Elitzur-vaidman bomb tester
- Fadu ssd
- Utp tester praxis
- Nous allons tester