FPGAs K Elliott Fleming Computer Science Artificial Intelligence

  • Slides: 18
Download presentation
FPGAs K. Elliott Fleming Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

FPGAs K. Elliott Fleming Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -01

FPGA: A Sea of Resources Logic Blocks PLL Processor Multiplier SRAM I/O Pads Clock

FPGA: A Sea of Resources Logic Blocks PLL Processor Multiplier SRAM I/O Pads Clock Buffers 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -2

What can we build? Resource DE 2 -70 DE 3 802. 11 ag SMIPS

What can we build? Resource DE 2 -70 DE 3 802. 11 ag SMIPS V 2 Logic Elements 68416 135200 85924 6501 Registers 70234 270400 42107 2841 SRAM 250 (4 K) 1040 (9 k) 265 (9 K) 226(4 K) Multipliers 300 576 321 0 Clock Buffers 16 32 7 5 PLL 4 8 1 0 8762 1603 Lines of Code - Very complex systems 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -3

Logic Block: Building functionality Combinational Output Muxing Logic Look-up Table + Combinational Input Carry

Logic Block: Building functionality Combinational Output Muxing Logic Look-up Table + Combinational Input Carry In Carry Out 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -4

Slice: Look-up Table n 10/2/2020 Just add enable logic http: //csg. csail. mit. edu/6.

Slice: Look-up Table n 10/2/2020 Just add enable logic http: //csg. csail. mit. edu/6. 375 Muxing Logic Can we make a ROM? Can we make a RAM? Enable Demux n Program flipflops Use inputs to select Combinational Input n Combinational Output Arbitrary Logic L 11 -5

Reconfigurable Wiring 2 D Mesh Grid n n Local connections made by driving powerful

Reconfigurable Wiring 2 D Mesh Grid n n Local connections made by driving powerful transistors Switches route across dimensions Switch Logic Block Heterogeneous wire length n n 10/2/2020 Many wires to nearby cells Few long-length wires Switch http: //csg. csail. mit. edu/6. 375 Switch L 11 -6

SMIPS System 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -7

SMIPS System 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -7

SMIPS Infrastructure 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -8

SMIPS Infrastructure 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -8

SMIPS Infrastructure Bus Interface Logic n Avalon Master/Slave Cbus Devices n n 10/2/2020 mk.

SMIPS Infrastructure Bus Interface Logic n Avalon Master/Slave Cbus Devices n n 10/2/2020 mk. CBus. Wide. Reg. RW(addr, reg); Many interfaces (Get, Reg. File, etc. ) Mechanism for building memory map automatically Some C drivers included http: //csg. csail. mit. edu/6. 375 L 11 -9

Demonstration Synplify Pro Quartus II Nios-II IDE 10/2/2020 http: //csg. csail. mit. edu/6. 375

Demonstration Synplify Pro Quartus II Nios-II IDE 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -10

Cryptosort: Think Different Large (. 5 GB) encrypted database n n n Decrypt Database

Cryptosort: Think Different Large (. 5 GB) encrypted database n n n Decrypt Database Sort Database on key Encrypt Database Do it fast, on an FPGA n n Design principals differ from ASIC Must be aware of FPGA hardware • Joint with Myron King, Man Cheuk Ng 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -11

From Problem: DRAM Cryptosorter • Encrypted Records External Memory • Sort Records inin. Records

From Problem: DRAM Cryptosorter • Encrypted Records External Memory • Sort Records inin. Records Ascending Order • Decrypt Database with AES Encrypt Sorted with AES 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -12

Cryptosort Architecture: PLB PPC PLB Master DRAM Feeder Function Unit: Sort Tree • Use

Cryptosort Architecture: PLB PPC PLB Master DRAM Feeder Function Unit: Sort Tree • Use Merge Sort O(n log(n)) 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -13 L-13

Engineering the Merge Tree < Probably optimal Easy para. Each to level for ASIC

Engineering the Merge Tree < Probably optimal Easy para. Each to level for ASIC 2 n meterize and build merges tree streams into n streams < < < 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -14

Refining the Module Naïve implementation: exponential resource usage n n Each comparator takes 3%

Refining the Module Naïve implementation: exponential resource usage n n Each comparator takes 3% of slices At most, fit 3 levels Key observation: n Throughput is rate-limited by final 2 -to 1 merge step This means each level only needs to perform one comparison per cycle 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -15

Sharing the Comparator: Idea Loop: Choose non-empty input pair corresponding to output fifo with

Sharing the Comparator: Idea Loop: Choose non-empty input pair corresponding to output fifo with room (scheduling) Compare the fifo heads Dequeue the smaller one and put it on output fifo < We save But we introduce area by having a comparator one comparator scheduling per problem level 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -16

Sharing the Comparator: Physical Implementation Issues Not enough regs n Each BRAM contains multiple

Sharing the Comparator: Physical Implementation Issues Not enough regs n Each BRAM contains multiple FIFOs Aggressive clock n Single cycle scheduling is impossible Enq happens several cycles after scheduling n 10/2/2020 Credit based flow control http: //csg. csail. mit. edu/6. 375 L 11 -17

Layout: 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -18

Layout: 10/2/2020 http: //csg. csail. mit. edu/6. 375 L 11 -18