Calorimeter Algorithm Firmware Calorimeter Trigger Upgrade Firmware Michael

  • Slides: 22
Download presentation
Calorimeter Algorithm Firmware Calorimeter Trigger Upgrade Firmware Michael Schulte, Katherine Compton, Tony Gregerson, Ben

Calorimeter Algorithm Firmware Calorimeter Trigger Upgrade Firmware Michael Schulte, Katherine Compton, Tony Gregerson, Ben Buchli, and Amin Farmahini-Farahani U. Wisconsin - Madison February 19, 2009 In collaboration with Wesley Smith, Sridhara Dasu, Michail Bachtis, Kevin Flood, Tom Gorski, David Hinkemeyer, Shuvra Bhattacharyya, William Plishker, George Zaki, Nimish Sane, and Soujanya Kedilaya U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 1

Introduction • Motivation and Goals • Design Platform and Methodology • Preliminary Designs and

Introduction • Motivation and Goals • Design Platform and Methodology • Preliminary Designs and Results • Input Rocket. IO and Input Buffering • Particle Cluster Finder • Cluster Overlap Filter • Planned Implementation on the Calorimeter Trigger Prototype • Planned Tools and Techniques U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 2

Motivation and Goals • The upgraded Calorimeter Trigger will require new algorithms • Modern

Motivation and Goals • The upgraded Calorimeter Trigger will require new algorithms • Modern FPGAs provide efficient platforms for these algorithms • Implement Calorimeter Trigger using • A unified design platform • Unified design and test methodologies • Techniques that facilitate future upgrades • Start by implementing a baseline design for the new algorithms U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 3

Initial Design Platform • Xilinx Virtex-5 devices contain • • Virtex-5 Slices (4 LUTs

Initial Design Platform • Xilinx Virtex-5 devices contain • • Virtex-5 Slices (4 LUTs and 4 flip-flops) DSP 48 E Slices (multiplier, adder, and accumulator) Block RAM (36 Kbits) Rocket. IO Transceivers • GTP transfers up to 3. 75 Gbps • GTX transfers up to 6. 50 Gbps • Initial designs synthesized for • Xilinx Virtex-5 LX 110 T and TX 240 T FPGAs FPGA Virtex-5 Slices DSP 48 E Slices Block RAM (Kbits) Rocket. IO Transceivers LX 110 T 17, 280 64 5, 328 16 GTP TX 240 T 37, 440 96 11, 664 48 GTX U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 4

Initial Design Methodology • Designs start with the algorithms • Physicists and engineers collaborate

Initial Design Methodology • Designs start with the algorithms • Physicists and engineers collaborate • Evaluate algorithm/implementation tradeoffs • Designs specified using • VHDL, Verilog, and Xilinx Core Generator • Designs implemented and tested using • Xilinx ISE v 10. 1 • Model. Sim Xilinx Edition v 6. 3 • Gather results for • Input Rocket. IO, input buffering, particle cluster finder, and cluster overlap filter U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 5

Rocket IO and Buffering 16 Serial Rocket. IO Tower Input 1 16 8 16

Rocket IO and Buffering 16 Serial Rocket. IO Tower Input 1 16 8 16 -bit Registers GTX Dual Tile Serial Rocket. IO Tower Input 1 Ref. Clock (640 MHz) Rocket. IO 8 16 -bit Registers Ref. Clock/2 (320 MHz) Input Buffers ECAL/HCAL Et [0] 16 16 16 -bit Registers 16 16 15 Ref Clock/16 (40 HMz) Cluster Input ECAL/HCAL Et [14] ECAL Finegrain Bits Particle Cluster Finder Inputs • Our initial design on TX 240 T FPGAs uses Xilinx’s Aurora protocol for Rocket. IO inputs • Each GTX Dual Tile de-serializes 2 x 8 x 16 = 256 bits every 25 ns. • 16 16 -bit registers store data for 15 towers for 25 ns. U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 6

Rocket IO and Buffering • Each pair of Rocket. IO links provides 17 -bit

Rocket IO and Buffering • Each pair of Rocket. IO links provides 17 -bit input data for 15 towers every 25 ns Tower • A 10 x 10 grid requires 14 Rocket. IO links • A 17 x 17 grid requires 40 Rocket. IO links Virtex-5 Resource Utilization for Rocket. IO and Input Buffering on TX 240 T FPGA Resource 10 x 10 Grid 17 x 17 Grid Rocket. IO Links 29% 83% Virtex-5 Slices 3% 8% U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 7

Particle Cluster Finder • Process data in 2 x 2 Tower Cluster clusters of

Particle Cluster Finder • Process data in 2 x 2 Tower Cluster clusters of towers (4 x 17 bits) 17 bits • Inputs: 17 bits per tower 17 bits (4 x 17 bits) • 8 ECAL Et bits • 8 HCAL Et bits • 1 ECAL finegrain bit • Algorithm is applied on overlapping clusters • Step of one tower • Identify if cluster contains “useful” particle energy • Eliminate some noise • Detect particle type U. Wisconsin, February 19, 2009 Threshold Pattern comparator 1 bit Threshold 1 bit 17 bits Threshold match? Pattern Decision Check no 1 bit Zero (38 bits) yes Finegrain OR 1 bit EPIM 1 bit Tower Energy Sums 4 x 9=36 bits Producing FPGA Firmware- 8

Algorithm • Input tower data • Apply threshold 2 x 2 Tower Cluster (4

Algorithm • Input tower data • Apply threshold 2 x 2 Tower Cluster (4 x 17 bits) • Boolean result, single bit per tower • Compare Boolean tower pattern to stored patterns • No match: output 38 zeros • Match: output 38 bits • OR of the finegrain bits • e/γ compatibility bit • Energy sums • 4 Towers (4 x 9 bits, E+H) U. Wisconsin, February 19, 2009 17 bits Threshold Pattern comparator 1 bit Threshold 1 bit 17 bits Threshold match? Pattern Decision Check no 1 bit Zero (38 bits) yes Finegrain OR 1 bit EPIM 1 bit Tower Energy Sums 4 x 9=36 bits Producing FPGA Firmware- 9

Electron/Photon Identification • The electron/photon identification module (EPIM) • Is the most complex module

Electron/Photon Identification • The electron/photon identification module (EPIM) • Is the most complex module in the particle cluster finder • Currently sets the e/γ compatibility bit if • Various implementations were investigated • Multiplier based – can easily change Egamma_Threshold • Static tables – reconfigure FPGA to change EPIM algorithm • Dynamic tables – change EPIM algorithm by reloading table U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 10

Cluster Particle Finder Resource Usage for a Single EPIM on TX 240 T FPGA

Cluster Particle Finder Resource Usage for a Single EPIM on TX 240 T FPGA Type Primary Resource Slice Register Usage Slice LUT Usage BRAM Usage DSP-Based DSP Block 0. 02% --- 1. 0% LUT-Based Logic Slice 0. 08% 0. 04% --- Hybrid DSP Block 0. 01% --- 4. 2% LUT Tree Logic Slice 0. 01% 14. 6% --- Distributed ROM Logic Slice 0. 01% 0. 12% --- Full BRAM 0. 01% 0. 02% 9. 8% --- Partial BRAM 0. 01% 0. 02% 0. 62% --- Category Multipliers Static Tables Dynamic Tables U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 11

Cluster Particle Finder Frequencies and maximum grid sizes for Particle Cluster Finder on TX

Cluster Particle Finder Frequencies and maximum grid sizes for Particle Cluster Finder on TX 240 T FPGA Category Multipliers Static Tables Dynamic Tables Type Max Freq. Actual Freq (MHz) Max EPIMs Max Grid (w/o I/O) Max Grid (w I/O) DSP-Based 370 200 96 22 x 22 17 x 17 LUT-Based 440 200 1060 73 x 73 17 x 17 Hybrid 320 200 24 11 x 11 LUT Tree 88 80 5 4 x 4 Distributed ROM 270 200 630 57 x 57 17 x 17 Full 390 200 10 8 x 8 Partial 450 200 162 29 x 29 17 x 17 U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 12

Particle Cluster Finder • Synthesized for a 200 MHz clock (5 ns cycle time)

Particle Cluster Finder • Synthesized for a 200 MHz clock (5 ns cycle time) • Latency of nine cycles (45 ns @ 200 MHz) Resource utilization for Particle Cluster Finder with Partial Dynamic Tables on TX 240 T FPGA Resource 10 x 10 Grid 17 x 17 Grid Virtex-5 Slices 12% 39% BRAMs 19% 53% U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 13

Cluster Overlap Filter • Applied on clusters produced Cluster origin by the Particle Cluster

Cluster Overlap Filter • Applied on clusters produced Cluster origin by the Particle Cluster Finder (holds all cluster info) Central cluster Pruned tower • Ensure that a tower only “belongs” to a single cluster • Input: 9 clusters • A central cluster • The 8 neighboring clusters • Determine to which cluster each tower should belong • Keep towers in clusters with the most energy • Prune towers from other clusters U. Wisconsin, February 19, 2009 38 bits per input Neighbor cluster NE E SE S SW W NW N Producing FPGA Firmware- 14

Algorithm • For each “centeral” cluster, • Consider each neighbor Cluster origin (holds all

Algorithm • For each “centeral” cluster, • Consider each neighbor Cluster origin (holds all cluster info) • If central Et < neighbor Et, neighbor cluster is “stronger” • Remove overlapping towers from central cluster • Otherwise central cluster is “stronger” • Remove overlapping towers from neighbor • If no towers removed from central cluster, set its “central” bit • Next apply threshold to cluster energy • Output: 14 bits Central cluster Pruned tower 38 bits per input Neighbor cluster NE E SE S SW W NW N • 11 bits of cluster energy, 1 Finegrain bit, 1 e/γ bit, 1 central bit U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 15

Cluster Overlap Filter Design Central NE E SE S SW W NW N Energy

Cluster Overlap Filter Design Central NE E SE S SW W NW N Energy Adder Energy Adder Energy Adder 11 b 11 b 11 b E 1 E 2 tower bit sequence (4 x 9 bits) Central < NE? Central <= SE? Central <= SW? Central <W ? Central < NW? Central < N? 1 bit 1 bit E 3 E 4 E 1 E 2 E 3 E 4 E 1 E 2 E 3 E 4 Finegrain, e/γ 2 bits Cluster Threshold E>X? Central (1 bit) Energy (11 bits) Energy Adder E 1+E 2+E 3+E 4 Producing FPGA Firmware- 16

Cluster Overlap Filter Results • Cluster Overlap Filter • Synthesized for a 200 MHz

Cluster Overlap Filter Results • Cluster Overlap Filter • Synthesized for a 200 MHz clock (cycle time of 5 ns) • Latency of five cycles (25 ns @ 200 MHz) • Operates in parallel with EPIM • No DSP 48 E or Block RAM resources needed Virtex-5 Slice Utilization for Cluster Overlap Filter U. Wisconsin, February 19, 2009 FPGA 10 x 10 Grid 17 x 17 Grid LX 110 T 18% 58% TX 240 T 8% 27% Producing FPGA Firmware- 17

Latency Estimates • Estimated latencies are given in the table below • Clock rate

Latency Estimates • Estimated latencies are given in the table below • Clock rate of 200 MHz (cycle time of 5 ns) • Cluster Overlap Filter operated in parallel with part of Particle Cluster Finder Estimated Latencies on TX 240 T FPGAs Component Latency (cycles) Latency (ns) Input Rocket. IO 10 50 Input Buffers 5 25 Particle Finder and Overlap Filter 9 45 Total Estimated Latency 24 120 U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 18

Overall Resource Estimates • Estimated resources are given in the table below • Includes

Overall Resource Estimates • Estimated resources are given in the table below • Includes input Rocket. IO, input buffers, particle finder, and overlap filter • Additional grid sizes and FPGA devices should be considered Overall Resource Utilization on TX 240 T FPGA Resource 10 x 10 Grid 17 x 17 Grid Rocket. IO Links 29% 83% Virtex-5 Slices 23% 74% Block Rams 19% 53% U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 19

Calorimeter Trigger Prototype • Implement the rest of the Calorimeter Trigger • • Particle

Calorimeter Trigger Prototype • Implement the rest of the Calorimeter Trigger • • Particle Isolation and Particle ID Jet Reconstruction Particle Sorter MET, HT, MHT Calculation • Perform more in-depth testing and analysis of the designs • Enhance the initial designs • Prototype the Calorimeter Trigger designs U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 20

New Tools and Techniques • We are working with U. of Maryland researchers to

New Tools and Techniques • We are working with U. of Maryland researchers to investigating new tools and techniques to design, test, and upgrade the CMS firmware • Dataflow languages • DIF and Open. DF • Tools and techniques for • Unit testing and automated testing • Efficient designs with multiple FPGAs • Generating FPGA firmware and simulator code from a single high-level specification • Web-base repositories and version tracking • Consistent (automated) documentation practices U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 21

Conclusions • The preliminary firmware for the Calorimeter Trigger Upgrade has been developed •

Conclusions • The preliminary firmware for the Calorimeter Trigger Upgrade has been developed • Initial results look promising • Additional designs are planned for this spring and summer • Still need to work on • Making the designs more easily upgradable • Experimenting with new algorithms • Helping to establish a unified platform plus unified design and test methodologies • New tools and techniques to facilitate future firmware development and upgrades U. Wisconsin, February 19, 2009 Producing FPGA Firmware- 22