11 Practical faulttolerant system design Reliable System Design
11. Practical fault-tolerant system design Reliable System Design 2005 by: Amir M. Rahmani
Past and Future l So far, we have studied: • • l - Some techniques for designing FT systems - Some techniques of redundancy - Some techniques for measuring FT parameters - Some techniques for evaluating FT systems Now, we learn • • - The complete process of designing FT systems - How to apply the previous learned knowledge to design a FT systems matlab 1. ir
Design of fault-tolerant system l Trade-off • l l l Old computer vs. New computer Fault tolerant techniques Fault avoidance techniques System evaluations matlab 1. ir
Design Process 1 - Problem definition Completely understand the problems 2 - System requirements Reliability Availability Coverage & latency Speed Power consumption Cost Weight Size Etc. Ex 1: a computer with reliability of 0. 99999 which controlling a motor with reliability of 0. 9 ? Ex 2: fault coverage of 0. 99 ? for which fault? matlab 1. ir
Design Process (cont. ) 3 - System partitioning Partitioning system into manageable subsystems Partitioning system based on • Reliability • Availability • Critically Example: Reliability of aircraft subsystems • Flight-critical functions ; R(t) =0. 99999 – Fly-by-wire (flap) • Mission-critical functions ; R(t) =0. 995 – Telecommunication • Convenience functions ; R(t) =0. 95 – “No smoking” lamp matlab 1. ir
Design Process (cont. ) 4 - Candidate designs - TMR vs. duplication - Advantages of one approach will uncover disadvantages of another approach 5 - High level analysis - Basic analysis based on • • • Reliability estimation Cost estimation Size estimation Complexity estimation Weight estimation Etc. - Omitting some designs • Ex: Omitting TMR approach (good reliability but high weight) matlab 1. ir
Design Process (cont. ) 6 - Hardware & Software specifications - The specifications for the hardware & software must be developed 7 - Hardware & Software design Analysis - Commercial aircraft control problem - NASA solutions • • FTMP (Fault-Tolerant Multi-Processor) • SIFT (Software Implemented Fault-Tolerant) matlab 1. ir
Design Process (cont. ) 8 - Testing - An extremely part of design process • Design mistake • Implementation mistake • Component defects 9 - System integration - Combining the hardware & software working together correctly 10 - Final testing 11 - Documentation Example in Johnson’s book: Aircraft computer system matlab 1. ir
Some concepts l Fault prevention • • - How to prevent, through construction, the occurrence, or introduction of faults Examples: • Design methodologies • Selecting high quality components • Design reviews l Fault tolerance • • - How to provide, through redundancy a service up to fulfilling the system function in spite of faults Examples: • Redundant HW/SW • Voting • Reconfiguration matlab 1. ir
Some concepts (cont. ) l Fault removal • • - How to reduce, through verification, diagnosis, and correction the presence of faults Examples: • • l Inspection or walk-thorough Data flow analysis Proof of correctness System behavior analysis (petri net) Fault forecasting • • - How to estimate, by evaluation, the presence, creation & consequences of failures Examples: • • Failure Mode and Affect Analysis Markov chain Reliability block diagram Fault trees matlab 1. ir
Using fault avoidance in the design process l Fault avoidance • • l Fault avoidance against • l - How to produce a fault-free systems Fault avoidance = Fault prevention + Fault removal - Design mistake, implementation mistake, … Fault avoidance approaches • - Various design reviews • Documentation is very important when team work is used • • • - Adherence to design rules - Shielding against external disturbance - Quality control check matlab 1. ir
- Slides: 11