Future Trends in Process Safety Prof Nancy Leveson















































- Slides: 47
Future Trends in Process Safety Prof. Nancy Leveson Engineering Systems Aeronautics and Astronautics MIT
– You’ve carefully thought out all the angles – You’ve done it a thousand times – It comes naturally to you – You know what you’re doing, it’s what you’ve been trained to do your whole life. – Nothing could possibly go wrong, right?
Think Again
Topics • Lessons from Texas City • New factors in process accidents • Safety as a control problem • Conclusions
Leadership • Safety requires passionate and effective leadership • Tone is set at the top of the organization • Not just sloganeering but real commitment • Setting priorities – Adequate resources assigned – A designated, high-ranking leader • Safety and productivity are not conflicting if take a long-term view
Managing and Controlling Safety • Need clear definition of expectations, responsibilities, authority, and accountability at all levels of safety control structure • Entire control structure must together enforce the system safety property • Unsafe changes must be eliminated or controlled through system design or detected and fixed before they lead to an accident. – Planned changes (MOC process) – Unplanned changes
Visibility and Communication • Downward and upward communication – Requires a positive, open, trusting environment – Need effective measurement and monitoring of process safety performance (e. g. , injury rates are not useful and are misleading) • Avoid “culture of denial” • If managers do not want to hear, people stop talking
Information and Appropriate Feedback • Good accident/incident investigation and follow through – Identification and correction of systemic causal factors. – Ensuring thorough reporting of incidents and near misses • Thorough hazard identification, analysis, and control • Effective process safety audit system to ensure adequate process safety performance
Oversight and Control • Results of operating experience, process hazard analyses, audits, near misses, or accident investigations must be used to improve process operations and process safety management system. • Address promptly and track to completion the deficiencies found during assessments, audits, inspections and incident investigation.
Fumbling for his recline button Ted unwittingly instigates a disaster
Process Safety vs. Personal Safety • All behavior influenced by context in which it occurs – Both physical and social context – Personal safety focuses on changing individual behavior – Process (system) safety focuses on design of system in which behavior occurs • To understand why process accidents occur and to prevent them, need to: – Understand current context (system design) – Create a design that effectively ensures safety
The Enemies of Safety • Complacency • Arrogance • Ignorance
Factors in Complacency • Discounting risk • Over-relying on redundancy • Unrealistic risk assessment • Ignoring low-probability, high-consequence events • Assuming risk decreases over time • Ignoring warning signs
Topics • Lessons from Texas City • New factors in process accidents – New technology – System accidents – New types of human error • Safety as a control problem • Conclusions
Accident with No Component Failures
Types of Accidents • Component Failure Accidents – Single or multiple component failures – Usually assume random failure • System Accidents – Arise in interactions among components – Related to interactive complexity and tight coupling – Exacerbated by introduction of computers and software
Safety vs. Reliability • Safety and reliability are NOT the same – Sometimes increasing one can even decrease the other. – Making all the components highly reliable will have no impact on system accidents. • For relatively simple, electro-mechanical systems with primarily component failure accidents, reliability engineering can increase safety • For complex systems, need something more
Humans in Process Safety • Usually define human error as deviation from normative procedures, but operators always deviate from standard procedures – Normative vs. effective procedures – Sometimes violation of rules has prevented accidents • Cannot effectively model human behavior by decomposing it into individual decisions and acts and studying it in isolation from – Physical and social context – Value system in which takes place – Dynamic work process
• Less successful actions are natural part of search by operators for optimal performance
New Operator Roles and Errors • High tech automation changing cognitive demands on operators – – Supervising rather than directly monitoring Doing more cognitively complex decision-making Dealing with complex, mode-rich systems Increasing need for cooperation and communication • Human-factors experts complaining about technologycentered automation – Designers focus on technical issues, not on supporting operator tasks – Leads to “clumsy” automation • Errors are changing, e. g. , errors of omission vs. commission
Impacts on System Design • Design for error tolerance • Alarm management (managing by exception) • Matching tasks to human characteristics • Design to reduce human errors • Providing information and feedback • Training and maintaining skills
Topics • Lessons from Texas City • New factors in process accidents • Safety as a control problem – New approaches to hazard analysis – Design for safety – Risk analysis and management • Conclusions
STAMP: A System’s Model of Accident Causality • Systems-Theoretic Accident Model and Processes – Safety treated as a control problem, not a “failure” problem – Accidents are not simply an event or chain of events • Involve a complex, dynamic process • Arise from interactions among humans, machines and the environment
A Broad View of “Control” • Does not imply need for a “controller” Component failures and dysfunctional interactions may be “controlled” through design (e. g. , redundancy, interlocks, fail-safe design) or through process • Manufacturing processes and procedures • Maintenance processes • Operations • Does imply the need to enforce safety constraints in some way
STAMP (2) • Safety is an emergent property that arises when system components interact with each other within a larger environment – A set of safety constraints related to behavior of system components enforces that property – Accidents occur when interactions among system components violate those constraints – Goal of process (system) safety engineering is to identify the safety constraints and enforce them in the system design
Example Safety Constraints • Build safety in by enforcing constraints on behavior Controller contributes to accidents not by “failing” but by: 1. Not enforcing safety-related constraints on behavior 2. Commanding behavior that violates safety constraints System Safety Constraint: Water must be flowing into reflux condenser whenever catalyst is added to reactor Software (Controller) Safety Constraint: Software must always open water valve before catalyst valve
STAMP (3) • Systems are not static – A socio-technical system is a dynamic process continually adapting to achieve its ends and to react to changes in itself and its environment – Systems and organizations migrate toward accidents (states of high risk) under cost and productivity pressures in an aggressive, competitive environment – Preventing accidents requires designing a control structure to enforce constraints on system behavior and adaptation that ensures safety
Example Control Structure
Controlling and managing dynamic systems requires visibility and feedback Controller Model of Process Control Actions Feedback Controlled Process
Relationship Between Safety and Process Models • Accidents occur when models do not match process and – Incorrect control commands given – Correct ones not given – Correct commands given at wrong time (too early, too late) – Control stops too soon (Note the relationship to system accidents)
Relationship Between Safety and Process Models (2) • How do they become inconsistent? – – Wrong from beginning Missing or incorrect feedback Not updated correctly Time lags not accounted for Resulting in Uncontrolled disturbances Unhandled process states Inadvertently commanding system into a hazardous state Unhandled or incorrectly handled system component failures
Modeling Accidents Using STAMP Two types of models are used: 1. Static safety control structure 2. Behavioral dynamics (system dynamics) Dynamic processes behind change in the safety control structure, i. e. , why it may change (e. g. , degrade) over time
Simplified System Dynamics Model of Columbia Accident
Uses for STAMP • Basis for new, more powerful hazard analysis techniques (STPA) • Safety-driven design • More comprehensive accident/incident investigation and root cause analysis • Organizational and cultural risk analysis – Defining safety metrics and performance audits – Designing and evaluating potential policy and structural improvements – Identifying leading indicators of increasing risk (“canary in the coal mine”) • New risk management tools • New holistic approaches to security
STAMP-Based Hazard Analysis (STPA) • Supports a safety-driven design process where – Hazard analysis influences and shapes early design decisions – Hazard analysis iterated and refined as design evolves • Goals (same as any hazard analysis) – Identification of system hazards and related safety constraints necessary to ensure acceptable risk – Accumulation of information about how hazards can be violated, which is used to eliminate, reduce and control hazards in system design, development, manufacturing, and operations
STPA (2) • STPA process – Starts with identifying system requirements and design constraints necessary to maintain safety. – Then STPA assists in • Top-down refinement into requirements and safety constraints on individual components. • Identifying scenarios in which safety constraints can be violated. • Using results to eliminate or control hazards in design, operations, etc.
© Copyright Nancy Leveson, Aug. 2006
Comparison of STPA with Traditional HA Techniques • Top-down (vs bottom-up like FMECA) • Considers more than just component failure and failure events (includes these but more general) • Guidance in doing analysis (vs. FTA) • Handles dysfunctional interactions and system accidents, software, management, etc.
Comparisons (2) • Concrete model (not just in head) – Not physical structure (HAZOP) but control (functional) structure – General model of inadequate control (based on control theory) • HAZOP guidewords based on model of accidents being caused by deviations in system variables • Includes HAZOP model but more general • Fault trees concentrate on component failures, miss system accidents
Risk Analysis and Risk Management Effectiveness and Credibility of ITA Time
System Technical Risk Time
Identifying Lagging vs. Leading Indicators Number of waivers issued good indicator for risk in Space Shuttle operations but lags rapid increase in risk Time
No. of incidents under investigation a better leading indicator Time
Managing Tradeoffs Among Risks • Good risk management requires understanding tradeoffs among – Schedule – Cost – Performance – Safety
Schedule Pressure Example: Schedule Pressure and Safety Priority High Low High Safety Priority 1. 2. Overly aggressive schedule enforcement has little effect on completion time (<2%) & cost, but has a large negative impact on safety Priority of safety activities has a large positive impact, including a positive cost impact (less rework)
Conclusions • Future needs for safety in the process industry: – Differentiation between process safety and personal (occupational) safety – Improved safety culture management – New approaches to handle • Advanced technology (particularly digital technology) • System accidents and complexity • New types of human error • Using a control-based (vs. failure-based) model of causality expands our power to prevent process accidents