TMR Schemes Melanie Berg MEI TechnologiesNASA GSFC Melanie
TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie. D. Berg@NASA. gov
Overview Premise: Why do various FPGAs require separate mitigation strategies? Radiation Effects in FPGA devices Mitigation and Actel Anti-fuse Devices Mitigation and Xilinx Virtex Devices Tools European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 2
Radiation Effects in FPGA devices Single Event Transients (SETs) Single Event Upsets (SEUs) Single Event Functional Interrupts (SEFIs) European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 3
Single Event Effects (SEEs) and IC System Error SEUs or SETs can occur in: Combinatorial Logic Sequential Logic Configuration Memory Cells Depending on the Device and the design, each fault type will: Have a probability of occurrence Either have a significant or insignificant contribution to system error Every Device has different Error Responses – We must understand the differences and design appropriately Page 4 European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg
Combinatorial Logic Blocks and Potential Upsets… SETs in Anti-fuse FPGAs European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 5
Basic Combinatorial Logic Blocks and Potential Upsets TRANSIENT PSET STUCK UNTIL OVERWRITTEN Probability of Configuration Fault PConfiguration European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 6
DFF’s: SEUs and SEFIs Strike Caught in Loop Probability of SEU PDFFSEU reset D Q CLK PSEFI Probability of SEFI European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 7
Transient Capture on A DFF Data Input Pin (SET→SEU) clock fs T(fs)pulse P(fs)SETgen P(fs)SETprop PDFFEn P(fs)SET→SEU tp = 1/fs P(fs)SET→SEU Tpulse : System Frequency : SET Pulse Width : Probability SET generated with sufficient amplitude : Probability SET can propagate with sufficient amplitude : Probability DFF is enabled (active) : Probability SET can be caught by clock edge European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 8
s. DFFerror Frequency Effects and Conventional DFF Upset Theory Composite Cross Section s) error f ( FF PD PDFFSEU & PDFFMBU P(fs ) EU →S T E S Frequency ~0 European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 9
Summary: Most Significant Factors of System Error Probability P(fs)error Configuration SRAM Based FPGAs DFFs SEFIs STATIC Dynamic Clocks & Resets SEU SET→SEU Inaccessible control circuitry European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 10
Reducing System Error: Common Mitigation Techniques Mitigation can be: Embedded: built into the device library cells User does not verify the mitigation – manufacturer does User inserted: part of the actual design process User must verify mitigation… Complexity is a RISK!!!! Common Mitigation Types: Local Triple Modular Redundancy (LTMR) Global Triple Modular Redundancy (GTMR) European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 11
Example Mitigation Schemes will use Majority Voting I 0 0 0 1 1 I 1 0 0 1 1 I 2 0 1 0 1 European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Majority Voter 0 0 0 1 1 1 Page 12
Mitigation and Actel Antifuse Devices European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 13
ACTEL RTAX-S Architecture Basics Super Cluster: • Combinatorial Cells: C CELLS • DFF Cells: R Cells Source: RTAX-S/SL Rad. Tolerant FPGAs 2009 Actel. com Embedded RHBD: Hardened Global Clocks and Resets Antifuse Configuration is SEU immune Embedded Localized TMR (LTMR) at each DFF (RCELL) European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 14
Local Triple Modular Redundancy (LTMR): Smallest Area & Power Non-Mitigated Triple Each DFF + Vote… Data paths are not redundant – can only have one voter Unprotected: Clocks and Resets… SEFI Transients (SET->SEU) Internal/hidden device logic: SEFI Low European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 15
ACTEL RTAX-S Embedded Mitigation… LTMR and SETs Combinatorial logic: C-CELL C C C R TX TX RX RX B TX TX RX RX Combinatorial logic C-CELL C C R Super Cluster Sequential logic R-CELL X TX X RX X Combinatorial logic C-CELL C C European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg R Page 16
RTAX Example: Probability of Error Reduction 0 Low ~0 Error Probability is Per DFF bit Error Rate must reflect frequency of operation European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 17
Upper-Bound Error Prediction RHBD Anti-fuse FPGA DFF (near) Static Error Bit Rate no CCells PDFFSEU: Source: Actel 15 MHz to 120 MHz: Dynamic Error Bit Rate with 8 levels of CCells P(fs)SET→SEU: Source: NASA Goddard European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 18
Upper-Bound Error Prediction Actel RHBD Anti-fuse FPGA With embedded LTMR Mitigation + Hardened Clocks: Thousands of years in LEO !!!!! European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 19
Mitigation and Xilinx Virtex Devices European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 20
Xilinx XQR 4 VSX 55: Radiation Test Data Xilinx Consortium: VIRTEX-4 VQ STATIC SEU CHARACTERIZATION SUMMARY: April/2008 Probability Configuration Pconfiguration Memory: XQR 4 VSX 55 Combined SEFIs per device PSEFI Error Rate LEO GEO 7. 43 4. 2 7. 5 x 10 -5 2. 7 x 10 -5 For non-mitigated designs the most significant upset M Berg, Trading ASIC and FPGA Considerations for System factor is: Insertion; IEEE Nuclear Science Radiation Effects Conference 2009 European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 21
Global Triple Modular Redundancy (GTMR): Largest Area → Greatest Complexity Non-Mitigated Triple Entire Design Triple I/O and Voters Unprotected – hidden device logic SEFIs Can not be an embedded strategy: Complex to verify Xilinx offers XTMR Low European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Low Page 22
XTMR – Capturing Asynchronous Input data Dynamic Analysis: • One domain leads the other two Async_data_tr 0 Async_data_tr 1 Async_data_tr 2 INPUT SKEW EDGE DETECT TIMING WAVEFORM European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 23
Time Domain Considerations: XTMR Single Bit Failures …Not Detected by Static Node Analysis CONFIGURATION BIT HIT NO EDGE DETECTION European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 24
Voters and Asynchronous Signal Capture Place voter after metastability filters It satisfies skew constraints because voter is anchored at DFF control points V O T E R European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 25
Upper-Bound Error Prediction: Xilinx FPGA XTMR PConfiguration ? ? ? SEUs are insignificant MBUs may be insignificant (still under investigation) Assumes proper scrubbing Assumes Unmitigated SEFIs are the most predominant source: European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 26
Tools European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 27
Mitigation and Actel Tools Mentor Graphics has offered LTMR for anti-fuse devices There is a desire to employ LTMR to Actel Flash Based products DTMR is another approach (GTMR with no clock redundancy) Flash Assist with SETs in Anti-fuse Device European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 28
Mitigation and Xilinx Tools Currently XTMR is commercially available from Xilinx NASA REAG has identified some issues: Asynchronous domain crossings Verification of XTMR insertion Mentor is now evaluating GTMR with Formal Checking NASA REAG is expecting to use Mentor GTMR (preliminary version) for V 5 radiation testing European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg Page 29
- Slides: 29