Fault Tolerant State Machines Gary Burke Stephanie Taft
Fault Tolerant State Machines Gary Burke, Stephanie Taft Jet Propulsion Laboratory, California Institute of Technology Burke 1 D_160 / MAPLD - 2004
Reasons for Fault Tolerant State Machines • Reliable designs are essential for Flight systems • The state machine needs to be tolerant of single event upsets Burke 2 D_160 / MAPLD - 2004
State Machines • A state machine is a sequential machine that when built into an FPGA or ASIC controls the sequencing of actions in the digital logic • The current state of a machine is held in a state register which is updated on a clock • The next value of the state register (next state) is derived from the current state and the inputs • Outputs from the state machine are decoded from the state register and can also be combined with the inputs Burke 3 D_160 / MAPLD - 2004
State Machine Encoding • Each distinct state of a state machine is represented by a unique binary code • Encoding is the assignment of binary codes to states Burke 4 D_160 / MAPLD - 2004
Different Methods of Encoding States • Binary – The simplest encoding method in which each state is given the next available binary number in sequence • One Hot – The number of bits in the code is equal to the number of states – Each encoded state has just 1 bit in the encoded word set to a 1 (the rest are 0) Burke 5 D_160 / MAPLD - 2004
Different Methods of Encoding States Continued • Hamming Distance of 2 (H 2) – Compared to Binary encoding Hamming 2 uses one extra bit to ensure all codes are separated by a Hamming distance of 2 – It will take 2 changes in the state register to reach another known state • Hamming Distance of 3 (H 3) – This extension on Hamming distance of 2 encoding uses additional bits to ensure all codes are separated by a Hamming distance of 3 – It will take 3 changes in the state register to reach another known state Burke 6 D_160 / MAPLD - 2004
Synthesis • To check the overhead of each of the state machines, they were individually synthesized • Finite state machine optimization is turned off • A clock frequency of 50 MHz is used • Target device is a Xilinx Spartan 2, speed grade 6 • Error injection circuitry is not included Burke 7 D_160 / MAPLD - 2004
Synthesis Results Burke 8 D_160 / MAPLD - 2004
Four Bit State Encoding Burke 9 D_160 / MAPLD - 2004
Eight Bit State Encoding Burke 10 D_160 / MAPLD - 2004
Twelve Bit State Encoding Burke 11 D_160 / MAPLD - 2004
Sixteen Bit State Encoding Burke 12 D_160 / MAPLD - 2004
Twenty-Four Bit State Encoding Burke 13 D_160 / MAPLD - 2004
Thirty-Two Bit State Encoding Burke 14 D_160 / MAPLD - 2004
Fault Injection Test • A test circuit is generated with an example of each state machine executing the same task, plus a reference state machine • The task chosen requires a 16 -state machine, to detect a 16 -bit pattern in a serial input stream • An error generator injects faults into all state machines except the reference state machine Burke 15 D_160 / MAPLD - 2004
Error Injection Test Continued • The outputs of each state machine are compared to the reference output • A set of counters tallies the comparison outputs • 2 types of failure are logged for each state machine: – Failure to detect pattern – False detection of pattern (false-positive) Burke 16 D_160 / MAPLD - 2004
Error Injection Test Continued • Non-key patterns are 1 -bit different from the key pattern, to increase the likelihood of a false match • Error rate can vary, set to 1: 199 clocks in example • Errors are weighted by distributing them pseudo-randomly over 16 bits. A state machine with a word size of n, receives n/16 of the total faults • Synchronous fault injection is before the state register • Asynchronous fault injection is after the state register • All results are from actual implementation of the test circuits in a Spartan 2 FPGA Burke 17 D_160 / MAPLD - 2004
Error Rate – Synchronous Faults Burke 18 D_160 / MAPLD - 2004
Error Rate – Asynchronous Faults Burke 19 D_160 / MAPLD - 2004
Error Rate – Asynchronous Pulse Faults Burke 20 D_160 / MAPLD - 2004
Results: Binary Encoding • Lowest resources used • Second fastest speed after One Hot – Fastest for small number of states • Second-most sensitive to errors • Generates false-positive errors i. e. reports false pattern matches Burke 21 D_160 / MAPLD - 2004
Results: One Hot Encoding • No false-positive errors (single faults) • Fastest speed except for small number of states and large number of states • Uses more resources than Binary • Inefficient for large number of states • Worst fault tolerance of all encoding tested • Has 2 x the error rate of binary encoding Burke 22 D_160 / MAPLD - 2004
Results: Hamming Distance of 2 (H 2) Encoding • No false-positive errors (single faults) • Better Fault Tolerance than Binary • More resources needed than One Hot, except for large number of states Burke 23 D_160 / MAPLD - 2004
Results: Hamming Distance of 3 (H 3) Encoding • Zero single-fault errors – Immune to synchronous and asynchronous errors • Lowest double-fault errors • Most resources used (*) ~2 x binary encoding • Slowest speed (*) Except for large number of states Burke 24 D_160 / MAPLD - 2004
Summary • Binary encoding will give unpredictable results when faults are injected; generating false-positive errors in the pattern matching example • One Hot encoding provides false-positive protection, but at the cost of considerably more errors • Hamming 2 encoded state machines will provide significantly better fault tolerance at a cost of about 25% more resources than binary • Hamming 3 encoded state machines give excellent fault tolerance but at a ~2 x increase in resources Burke 25 D_160 / MAPLD - 2004
- Slides: 25