SafetyCritical Systems 3 HardwareSoftware T 79 232 Ilkka

  • Slides: 34
Download presentation
Safety-Critical Systems 3 Hardware/Software T 79. 232 Ilkka Herttua

Safety-Critical Systems 3 Hardware/Software T 79. 232 Ilkka Herttua

Current situation / critical systems • a) b) c) d) Based on the data

Current situation / critical systems • a) b) c) d) Based on the data on recent failures of critical systems, the following can be concluded: Failures become more and more distributed and often nation-wide (e. g. commercial systems like credit card denial of authorisation) The source of failure is more rarely in hardware (physical faults), and more frequently in system design or end-user operation / interaction (software). The harm caused by failures is mostly economical, but sometimes health and safety concerns are also involved. Failures can impact many different aspects of dependability (dependability = ability to deliver service that can justifiably be trusted).

Examples of computer failures in critical systems

Examples of computer failures in critical systems

Driving force: federation • Safety-related systems have traditionally been based on the idea of

Driving force: federation • Safety-related systems have traditionally been based on the idea of federation. This means, a failure of any equipment should be confined, and should not cause the collapse of the entire system. • When computers were introduced to safety-critical systems, the principle of federation was in most cases kept in force. • Applying federation means that Boeing 757 / 767 flight management control system has 80 distinct microprocessors (300, if redundancy is taken into account). Although having this number of microprocessors is no longer too expensive, there are other problems caused by the principle of federation.

Hardware Faults Intermittent faults - Fault occurs and recurrs over time (loose connector) Transient

Hardware Faults Intermittent faults - Fault occurs and recurrs over time (loose connector) Transient faults - Fault occurs and may not recurr (lightning) - Electromagnetic interference Permanent faults - Fault persists / physical processor failure (design fault – over current)

Fault Tolerance • Fault tolerance hardware - Achieved mainly by redundancy Redundancy - Adds

Fault Tolerance • Fault tolerance hardware - Achieved mainly by redundancy Redundancy - Adds cost, weight, power consumption, complexity Other means: - Improved maintenance, single system with better materials (higher MTBF)

Redundancy types Active Redundancy: - Redundant units are always operating. Dynamic Redundancy (standby): -

Redundancy types Active Redundancy: - Redundant units are always operating. Dynamic Redundancy (standby): - Failure has to be detected - Changeover to other modul

Hardware redundancy techniques Active techniques: - Parallel (k of N) - Voting (majority/simple) Standby

Hardware redundancy techniques Active techniques: - Parallel (k of N) - Voting (majority/simple) Standby : - Operating - hot stand by - Non-operating – cold stand by

Reliability prediction • Electronic Component - Based on propability and statictical - MIL-Handbook 217

Reliability prediction • Electronic Component - Based on propability and statictical - MIL-Handbook 217 – experimental data on actual device behaviour - Manufacture information and allocated circuit types -Bath tube curve; burn in – useful life – wear out

Reliability calculation for system MTTF Mean time to failure- average time for which system

Reliability calculation for system MTTF Mean time to failure- average time for which system would operate before first failure • MTTR Mean time to repair – time to get system back in service again • MTBF Mean time between failures MTBF= MTTF+MTTR

Safety-Critical Hardware Fault Detection: - Routines to check that hardware works - Signal comparisons

Safety-Critical Hardware Fault Detection: - Routines to check that hardware works - Signal comparisons - Information redundancy –parity check etc. . - Watchdog timers - Bus monitoring – check that processor alive - Power monitoring

Safety-Critical Hardware Possible hardware: COTS Microprocessors - No safety firmware, least assurance - Redundancy

Safety-Critical Hardware Possible hardware: COTS Microprocessors - No safety firmware, least assurance - Redundancy makes better, but common failures possible - Fabrication failures, microcode and documentation errors - Use components which have history and statistics.

Safety-Critical Hardware Specialist Microprocessors - Collins Avionics/Rockwell AAMP 2 - Used in Boeing 747

Safety-Critical Hardware Specialist Microprocessors - Collins Avionics/Rockwell AAMP 2 - Used in Boeing 747 -400 (30+ pieces) - High cost – bench testing, documentation, formal verification - Other models: Sparc. V 7, TSC 695 E, ERC 32 (ESA radiation-tolerant), 68 HC 908 GP 32 (airbag)

Safety-Critical Hardware Programmable Logic Controllers PLC • Contains power supply, interface and one or

Safety-Critical Hardware Programmable Logic Controllers PLC • Contains power supply, interface and one or more processors. • Designed for high MTBFs • Firmware • Programm stored in EEPROMS • Programmed with ladder or function block diagrams

Safety-Critical Software Correct Program: - Normally iteration is needed to develop a working solution.

Safety-Critical Software Correct Program: - Normally iteration is needed to develop a working solution. (writing code, testing and modification). - In non-critical environment code is accepted, when tests are passed. - Testing is not enough for safety-critical application – Needs an assessment process: dynamic/static testing, simulation, code analysis and formal verification.

Safety-Critical Software Dependable Software : - Process for development - Work discipline - Well

Safety-Critical Software Dependable Software : - Process for development - Work discipline - Well documented - Quality management - Validated/verificated

Safety-Critical Software Safety-Critical Programming Language: -Logical soundness: Unambigous definition of the language- no dialects

Safety-Critical Software Safety-Critical Programming Language: -Logical soundness: Unambigous definition of the language- no dialects of C++ - Simple definition: Complexity can lead to errors in compliers or other support tools - Expressive power: Language shall support to express domain features efficiently and easily - Security of definition: Violations of the language definition shall be detected - Verification: Language supports verification, proving that the produced code is consistent with the specification. - Memory/time constrains: Stack, register and memory usage are controlled.

Safety-Critical Software faults: - Requirements defects: failure of software requirements to specify the environment

Safety-Critical Software faults: - Requirements defects: failure of software requirements to specify the environment in which the software will be used or unambigious requirements - Design defects: not satisfying the requirements or documentation defects - Code defects: Failure of code to conform to software designs.

Safety-Critical Software faults: - Subprogram effects: Definition of a called variable may be changed.

Safety-Critical Software faults: - Subprogram effects: Definition of a called variable may be changed. -Definitions aliasing: Names refer to the same storage location. - Initialising failures: Variables are used before assigned values. - Memory management: Buffer, stack and memory overflows - Expression evalution errors: Divide-byzero/arithmetic overflow

Safety-Critical Software Language comparison: -Structured assembler (wild jumps, exhaustion of memory, well understood) -

Safety-Critical Software Language comparison: -Structured assembler (wild jumps, exhaustion of memory, well understood) - Ada (wild jumps, data typing, exception handling, separate compilation) - Subset languages: CORAL, SPADE and Ada (Alsys CSMART Ada kernel) - Validated compilers for Pascal and Ada - Available expertise: with common languages higher productivity and fewer mistakes, but C still not appropriate.

Safety-Critical Software Languages used : - Boeing uses mostly Ada, but still for type

Safety-Critical Software Languages used : - Boeing uses mostly Ada, but still for type 747 -400 about 75 languages used. - ESA mandated Ada for mission critical systems. - NASA Space station in Ada, some systems with C and Assembler. - Car ABS systems with Assembler - Train control systems with Ada - Medical systems with Ada and Assembler - Nuclear Reactors core and shut down system with Assembler, migrating to Ada.

Safety-Critical Software Tools - High reliability and validated tools are required: Faults in the

Safety-Critical Software Tools - High reliability and validated tools are required: Faults in the tool can result in faults in the safety critical software. - Widespread tools are better tested - Use confirmed process of the usage of the tool - Analyse output of the tool: static analysis of the object code - Use alternative products and compare results - Use different tools (diversity) to reduce the likelihood of wrong test results.

Safety-Critical Software Designing Principles - Use hardware interlocks before computer/software - New software features

Safety-Critical Software Designing Principles - Use hardware interlocks before computer/software - New software features add complexity, try to keep software simple - Plan for avoiding human error – unambigious human-computer interface - Removal of hazardous module (Ariane 5 unused code)

Safety-Critical Software Designing Principles - Add barriers: hard/software locks for critical parts - Minimise

Safety-Critical Software Designing Principles - Add barriers: hard/software locks for critical parts - Minimise single point failures: increase safety margins, exploit redundancy and allow recovery. - Isolate failures: don‘t let things get worse. - Fail-safe: panic shut-downs, watchdog code - Avoid common mode failures: Use diversity – different programmers, n-version programming

Safety-Critical Software Designing Principles: - Fault tolerance: Recovery blocks – if one module fails,

Safety-Critical Software Designing Principles: - Fault tolerance: Recovery blocks – if one module fails, execute alternative module. - Don‘t relay on run-time systems

Safety-Critical Software Techniques/Tools: -Fault prevention: Preventing the introduction or occurence of faults by using

Safety-Critical Software Techniques/Tools: -Fault prevention: Preventing the introduction or occurence of faults by using design supporting tools (UML with CASE tool) -Fault removal: Testing, debugging and code modification

Safety-Critical Software faults: - Faults in software tools (development/modelling) can results in system faults.

Safety-Critical Software faults: - Faults in software tools (development/modelling) can results in system faults. -Techniques for software development (language/design notation) can have a great impact on the performance od the people involved and also determine the likelihiid of faults. - The characteristics of the programming systems and their runtime determine how great the impact of possible faults on the overall software subsystem can be.

Safety-Critical Software Architectural design: Layered structure 1 - High level command control functions 2

Safety-Critical Software Architectural design: Layered structure 1 - High level command control functions 2 – Intermediate level routines 3 – I/O routines and device driver

Safety-Critical Software Architectural design: - Design is done after partitioning of the required functions

Safety-Critical Software Architectural design: - Design is done after partitioning of the required functions on hardware and software. - Complete specification of the architecture with components, data structures and interfaces (messages/protocols)

Safety-Critical Software Architectural design: - Test plan for each module (testability) - Human-computer interface

Safety-Critical Software Architectural design: - Test plan for each module (testability) - Human-computer interface - Change control system needed for inconsistencies and inadequacies within specification. - Verification of the architectural design against specification - Software partitioning: modular aids comprehension and isolation (fault limiting)

Safety-Critical Software Reduction of Hazardous Conditions summary - Simplify: Code contains only minimum features

Safety-Critical Software Reduction of Hazardous Conditions summary - Simplify: Code contains only minimum features and no unnecessary or undocumented features or unused executable code - Diversity: Data and control redundancy - Multi-version programming: shared specification leads to common-mode failures, but synchronisation code increases complexity

Safety-Critical Software Home assignments 3 : - 6. 42 (fault-tolerant system) - 7. 15

Safety-Critical Software Home assignments 3 : - 6. 42 (fault-tolerant system) - 7. 15 (reliability model) - 9. 17 (reuse of software) Please email to herttua@eurolock. org by 24 of February 2004

Home assignments 1&2 • 1. 12 (primary, functional and indirect safety) • 2. 4

Home assignments 1&2 • 1. 12 (primary, functional and indirect safety) • 2. 4 (unavailability) • 3. 23 (fault tree) • 4. 18 (tolerable risk) • 5. 10 (incompleteness within specification) Email before 24. February to herttua@eurolock. org 11 and 18 February Case Studies/ Teemu Tynjälä