FaultTolerant Resynthesis for DualOutput LUTs Roy Lee 1

Outline l Background and Problem Formulation l Algorithms l Experimental Results l Conclusion and

Motivation l Same as CPU and ASIC, FPGA is susceptible to soft errors ¤

Recent Research for SER l SEU (MCU) aware FPGA routing ¤ [Bozorgzadeh, DAC’ 07]

Deterministic vs. Stochastic 0 0 1 0 x 1 Deterministic Boolean space LUT 0

Key Design Freedom: Logic Masking l SEUs are created equally but not propagated equally

More Opportunity in Modern FPGAs Xilinx Virtex-5 LUT Altera Stratix III ALM l Dual-output

Fault Masking Using DO LUTs (0 -> 1) ? LUT#=3 Level=2 Xilinx Virtex-5 dual-output

Potential of the Optimization using DO LUTs l Pin utilization rate is low with

Problem Formulation Fault-Tolerant using Dual-output LUTs l Given: ¤a mapped circuit for K-input 2

Fault Modeling l Assume a stochastic single fault model ¤ At most one fault

FMD Impact on SRAM Criticality LSB Average Crit. = 0. 125 Duplication Average Crit.

FMD Impact on SRAM Criticality (cont. ) Average Crit. = 0. 25 AND Encoding

FMD Impact on SRAM Criticality (cont. ) Average Crit. = 0. 125 AND Encoding

FMD ILP Formulation l Requirement for applying a FMD: ¤ Each of the LUT

Partial Masking-based Duplication (PMD) Part of the fanout LUTs are encoded (a) Original LUTs

Experimental Settings l Benchmarks ¤ Biggest 20 MCNC circuits ¤ Mapped to 6 -LUTs

120, 00% 100, 00% MTTF improvement MTTF & Area Overhead 80, 00% 60, 00%

120, 00% 100, 00% MTTF improvement MTTF & Performance Overhead 80, 00% 60, 00%

Conclusions and Future Work l Proposed a novel fault-tolerant technique using dual-output LUTs ¤

Thank you! Electronic Design Automation Group Electrical Engineering, UCLA Website: http: //eda. ee. ucla.

Slides: 22

Download presentation

Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept. , UCLA 2 Computer 3 Computer Science Dept. , UCLA Science Dept. , City University of Hong Kong Address comments to: Dr. Lei He (lhe@ee. ucla. edu)

Outline l Background and Problem Formulation l Algorithms l Experimental Results l Conclusion and Future Work

Motivation l Same as CPU and ASIC, FPGA is susceptible to soft errors ¤ ¤ Permanent ones by periodically scrubs Transient ones by TMR l TMR has 5 -6 x area/power overhead ¤ Unbearably expensive for non mission-critical applications such as internet routers l There does not exist a selected TMR flow to obtain desired MTBF with minimal area/power overhead

Recent Research for SER l SEU (MCU) aware FPGA routing ¤ [Bozorgzadeh, DAC’ 07] l Device and architecture co-optimized for SER ¤ [ICCAD’ 07][ISFPGA’ 08] l Logic synthesis for MTBF optimization ¤ [Best paper nomination, ICCAD’ 08]

Deterministic vs. Stochastic 0 0 1 0 x 1 Deterministic Boolean space LUT 0 1 0 1 0 0 y LUT 0 1 1 1 x 2 defect rate=0. 01 Stochastic Boolean space 0 0 1 0 ` x 1 LUT 0 1 1 1 x 2 ` P(0)=0. 8 P(0)=0. 5 P(1)=0. 7 P(1)=0. 8 P(0)=0. 9 P(1)=0. 8 LUT y P(0)=0. 7 Stochastic yield rate = 0. 5

Key Design Freedom: Logic Masking l SEUs are created equally but not propagated equally l Stochastic logic synthesis increases MTBF by 30% without area/power/delay overhead [ICCAD’ 08] Not effected by defects! Observability Don’t -cares with a=1&b=1 1 1 defect

More Opportunity in Modern FPGAs Xilinx Virtex-5 LUT Altera Stratix III ALM l Dual-output (DO) LUT l. Merging two small LUTs into one dual-output LUT l. Originally designed to increase the logic density

Fault Masking Using DO LUTs (0 -> 1) ? LUT#=3 Level=2 Xilinx Virtex-5 dual-output LUT l One spare pin is needed for duplication and encoding, respectively

Potential of the Optimization using DO LUTs l Pin utilization rate is low with state-of-art logic synthesis

Problem Formulation Fault-Tolerant using Dual-output LUTs l Given: ¤a mapped circuit for K-input 2 -output LUTs, l Design freedom: ¤ perform duplication and encoding l Optimization objective: ¤ the full-chip fault rate is minimized. Two approaches ¤ ¤ Fully masking (FMD): encoding all fanouts Partial masking (PMD): encoding part of fanouts

Fault Modeling l Assume a stochastic single fault model ¤ At most one fault occurring at a time ¤ A fault with identical random distribution for each SRAM bit l The criticality of a SRAM bit Combination of observability and signal probability ¤ Measured by the percentage of input vectors that cause observable output errors if a fault occurs in this SRAM bit. ¤ l The fault rate of a full-chip is the percentage of input vectors that cause observable output errors assuming the single fault. ¤ Fault rate is the average criticality of all SRAM bits

FMD Impact on SRAM Criticality LSB Average Crit. = 0. 125 Duplication Average Crit. = 0. 25 Input Output Crit. 000 0 0. 2 001 1 0. 2 010 0 0. 4 011 1 0. 2 100 not used 0 100 0 0. 2 101 not used 0 101 1 0. 2 110 not used 0 110 0 0. 4 111 not used 0 111 1 0. 2

FMD Impact on SRAM Criticality (cont. ) Average Crit. = 0. 25 AND Encoding Average Crit. = 0. 08 Input Output Crit. 000 0 0. 2 000 0 0 001 1 0. 2 010 0 0. 4 010 0 0 011 1 0. 2 100 0 0 101 1 0. 2 110 0 0. 4 110 0 0 111 1 0. 2 Average criticality reduces after duplication.

FMD Impact on SRAM Criticality (cont. ) Average Crit. = 0. 125 AND Encoding Average Crit. = 0. 125 Input Output Crit. 000 0 0. 2 001 1 0. 2 010 0 0. 4 010 don’t care 0 011 1 0. 2 011 don’t care 0 100 not used 0 100 don’t care 0 101 not used 0 101 don’t care 0 110 not used 0 110 0 0. 4 111 not used 0 111 1 0. 2 Average criticality remains after encoding.

FMD ILP Formulation l Requirement for applying a FMD: ¤ Each of the LUT to be duplicated and its fanout LUTs must have at least one input pin not occupied l ILP formulation: Decision variables: 1 indicates duplication of LUT L Criticality reduction due to duplication of LUT L

Partial Masking-based Duplication (PMD) Part of the fanout LUTs are encoded (a) Original LUTs (b) LUTs after duplication and encoding l A generalized Full Masking-based Duplication

Experimental Settings l Benchmarks ¤ Biggest 20 MCNC circuits ¤ Mapped to 6 -LUTs by Berkeley ABC l A LUT-merger algorithm [Ahmed et al, FPGA’ 07] is used for area reduction. l Full-chip fault rate is verified by Monte Carlo simulation with 5 K input vectors l Three CAD flows are examined

Experimental CAD Flows

120, 00% 100, 00% MTTF improvement MTTF & Area Overhead 80, 00% 60, 00% 40, 00% 20, 00% fmd pmd fmd-R (A) Area overhead(combinational) Area overhead 40, 00% 20, 00% pmd-R fmd pmd fmd-R pmd-R (A) MTTF improvement (sequential) Area overhead fmd 60, 00% pmd-R (A) MTTF improvement(combinational) 40, 00% 35, 00% 30, 00% 25, 00% 20, 00% 15, 00% 10, 00% 5, 00% 0, 00% 80, 00% 40, 00% 35, 00% 30, 00% 25, 00% 20, 00% 15, 00% 10, 00% 5, 00% 0, 00% fmd pmd fmd-R (A) Area overhead(sequential) pmd-R

120, 00% 100, 00% MTTF improvement MTTF & Performance Overhead 80, 00% 60, 00% 40, 00% 20, 00% fmd pmd fmd-R pmd-R (A) MTTF improvement (sequential) 20, 00% fmd pmd fmd-R pmd-R (A) Logic depth overhead(combinational) Logic depth overhead 60, 00% pmd-R (A) MTTF improvement(combinational) 16, 00% 14, 00% 12, 00% 10, 00% 8, 00% 6, 00% 4, 00% 2, 00% 0, 00% 80, 00% 15, 00% 10, 00% 5, 00% 0, 00% fmd pmd fmd-R pmd-R (A) Logic depth overhead(sequential)

Conclusions and Future Work l Proposed a novel fault-tolerant technique using dual-output LUTs ¤ 2 X MTTF increase w/ 24% area overhead l In the future, we will consider Different type of encoding logic ¤ Dual-output-aware physical synthesis which considers interconnects explicitly ¤ Path-based duplication ¤ l To build a selected TMR flow

Thank you! Electronic Design Automation Group Electrical Engineering, UCLA Website: http: //eda. ee. ucla. edu