Synchronization Issues of TMR Crossing Multiple Clock Domains

  • Slides: 13
Download presentation
Synchronization Issues of TMR Crossing Multiple Clock Domains Yubo Li Ph. D Student Dr.

Synchronization Issues of TMR Crossing Multiple Clock Domains Yubo Li Ph. D Student Dr. Brent Nelson Professor Dr. Mike Wirthlin Professor Brigham Young University

Introduction n TMR suffers from 3 issues when crossing clock domains 1. Meta-stability causes

Introduction n TMR suffers from 3 issues when crossing clock domains 1. Meta-stability causes problems when asynchronously sampling 2. Sampling uncertainty is inevitable when sampling asynchronous signals 3. SEUs exacerbate the effect of sampling uncertainty resulting in lower reliability 2

Issue #1: Meta-stability Sender’s domain Receiver’s domain send. Sig a S R Q Q

Issue #1: Meta-stability Sender’s domain Receiver’s domain send. Sig a S R Q Q D b c CLK D Q rcv. Sig CLK rcv. Clk Resolution time Tr = Tclk_receiver – Tpd Example: K 1=0. 1 ns, K 2=24. 3/ns (Xilinx Virtex 4), Fs=100 MHz, Fr=333 MHz, Tpd = 0. 5 ns MTBF = 2. 3× 1012 years Reference: Peter Alfke, “Metastable delay in Virtex FPGAs”, tech. report, Xilinx, 2008 3

Issue #2: Sampling Uncertainty(1/2) n Sampling uncertainty is inevitable when sampling asynchronous signals q

Issue #2: Sampling Uncertainty(1/2) n Sampling uncertainty is inevitable when sampling asynchronous signals q Even with balanced delays on the 3 interconnect legs, signals arrive in receiver’s domain on different cycles Module A Module B delay. A delay. B D Q rcv. Clk CLK D send. Sig Q rcv. Sig_A CLK rcv. Sig_B Module C delay. C Sender’s domain D Q CLK rcv. Sig_C Receiver’s domain disagreement Example: disagreement dmax-dmin=400 ps, i=1, Fs=100 MHz, Fr=30 MHz # disagreements = 1. 2× 106/sec # disagreements/sec = 4

Issue #2: Sampling Uncertainty(2/2) n Created a real FPGA design to verify the equation

Issue #2: Sampling Uncertainty(2/2) n Created a real FPGA design to verify the equation # disagreements/sec = n Experimental results: q q i is fixed at 1 in all experiments. The value of dmax-dmin measured by FPGA Editor is 0. 615 ns. Fs (MHz) Fr (MHz) Disagreements/sec dmax-dmin (ns) 100 50 2019785 0. 404 100 40 1606207 0. 402 100 30 1200021 0. 400 100 20 803698 0. 402 5

Issue #3: SEUs n SEUs + sampling uncertainty TMR failure A B w/o SEUs

Issue #3: SEUs n SEUs + sampling uncertainty TMR failure A B w/o SEUs C output Case 1 Case 2 Case 3 Stuck-at-’ 0’ fault A ‘ 0’ B w/ SEUs C output Case 1 Case 2 Case 3 6

Typical Synchronizers (1/2) For sending a signal from a faster clock domain to a

Typical Synchronizers (1/2) For sending a signal from a faster clock domain to a slower clock domain send. Sig S R Q a D b Q CLK c D CLK rcv. Clk Synchronizer 1 send. Sig rcv. Clk a b c rcv. Sig 7 Q rcv. Sig

Typical Synchronizers (2/2) For sending a signal from a faster clock domain to a

Typical Synchronizers (2/2) For sending a signal from a faster clock domain to a slower clock domain a send. Sig S R Q b D c Q CLK D CLK rcv. Clk Synchronizer 2 send. Sig rcv. Clk a b c d rcv. Sig 8 Q rcv. Sig d D CLK Q

Mitigation Solutions (1/2) n Solution 1 rcv. Sig_A rcv. Sig_B rcv. Sig_C Voter This

Mitigation Solutions (1/2) n Solution 1 rcv. Sig_A rcv. Sig_B rcv. Sig_C Voter This is just a 6 -LUT + flip flop. Need 3 copies A_prev rcv. Sig_B A_prev rcv. Sig_C send. Sig_A send. Sig_B Synchronizer #1 rcv. Sig_A A_prev rcv. Sig_B B_prev rcv. Sig_C C_prev SEU: stuck low rev. Sig_A `0` B_prev ……rcv. Sig_A B_prev rcv. Sig_C … q Edge detector output C_prev rcv. Sig_A C_prev rcv. Sig_B … SEU: stuck high `1` rev. Sig_B w/ SEUs rcv. Sig_C q output Case 2 Case 1 9

Mitigation Solutions (2/2) n Solution 2 send. Sig_A send. Sig_B send. Sig_C Synchronizer #2

Mitigation Solutions (2/2) n Solution 2 send. Sig_A send. Sig_B send. Sig_C Synchronizer #2 rcv. Sig_A voter rcv. Sig_B voter Synchronizer #2 q_A Edge detector output_A q_B Edge detector output_B q_C Edge detector output_C rcv. Sig_C SEU: stuck low rev. Sig_A `0` SEU: stuck high `1` rev. Sig_B w/ SEUs rcv. Sig_C q output Case 2 Case 1 10

Synchronizer: Slow to Fast n. Domain Different synchronizer to use when sending a signal

Synchronizer: Slow to Fast n. Domain Different synchronizer to use when sending a signal from a slower domain to a faster one ‘ 1’ D Q send. Sig CLK CLR x D Q y z CLK D Q rcv. Sig CLK rcv. Clk send. Sig rcv. Clk x y z rcv. Sig n Same mitigation solutions apply to this synchronizer 11

Reliability Comparison n Mitigated vs. unmitigated synchronizer designs Mitigated, TMR-ed synchronizer Four orders of

Reliability Comparison n Mitigated vs. unmitigated synchronizer designs Mitigated, TMR-ed synchronizer Four orders of magnitude improvement in terms of MTTF 2 x 1011 vs 8 x 106 seconds Unmitigated synchronizer X-axis: size of mitigated ÷ unmitigated 12

Conclusion n Mitigation solutions for sending signals across domains was presented q n Even

Conclusion n Mitigation solutions for sending signals across domains was presented q n Even though the area increases, reliability is measurably improved Future Work q q Hand-shake protocols Bundled data Asynchronous FIFOs Demonstrate reliability improvement of synchronizers using fault injection 13