Digital Modifications and Configuration Control of Digital Systems

  • Slides: 31
Download presentation
Digital Modifications and Configuration Control of Digital Systems John Connelly Exelon Generation Engineering Manager

Digital Modifications and Configuration Control of Digital Systems John Connelly Exelon Generation Engineering Manager – Capital Projects

OPEX – Digital Challenges ü Implementation of digital modifications is an industry wide issue:

OPEX – Digital Challenges ü Implementation of digital modifications is an industry wide issue: • • IER 11 -02 identifies adverse trend in SCRAMS between 2005 and 2010 43 SCRAMS (35%) were the result of flawed implementation of Design Changes involving digital technology ü INPO 10 -008 examined events from 2003 to 2007 • • 17 SCRAMS from software malfunctions resulted in loss of 1. 6 million MWh 24 SCRAMS from hardware malfunctions resulted in a loss of 3. 1 million MWh Significant operational and safety challenges A modest $50 / MWh yields an industry-wide cost of ~$200 M 2

Common Threads Irrespective of utility, most events share two common themes: • Flaws in

Common Threads Irrespective of utility, most events share two common themes: • Flaws in the processes by which digital modifications are implemented • Inadequate knowledge of the complex technologies and techniques common to nearly all digital modifications 3

Changes To INPO Evaluation Process - CM ü Performance Objectives for the Design Change

Changes To INPO Evaluation Process - CM ü Performance Objectives for the Design Change Process (CM. 3) are under revision ü Future INPO evaluations will include a review of the processes by which you manage the unique characteristics of digital technology. This includes: • • • Development and control of procurement specifications Software Vendor interfaces Testing Validation Failure Modes and Effects Analysis 4

Changes To INPO Evaluation Process - Knowledge ü Application of digital technology requires very

Changes To INPO Evaluation Process - Knowledge ü Application of digital technology requires very different and specialized skills to implement correctly ü INPO ACAD 98 -04, Rev 2 introduces the entity of “Digital Engineer” ü Engineers assigned to work independently on digital projects must be qualified to ACAD 98 -04, Rev 2 by March 2013 ü Training evaluations conducted after March of 2013 will be in accordance with the requirements of ACAD 98 -04, Rev 2 5

Knowledge and Process Inventory ü Digital technology, while superior in nearly every dimension to

Knowledge and Process Inventory ü Digital technology, while superior in nearly every dimension to analog technology, requires very different competencies and processes: • • • Software engineering Hardware design Exception / Fault / Error Handling / Recovery Networking Cyber Security Human Factors Engineering Advanced analysis techniques (FMEA / SHA / CDR) EMI / RFI Interfacing systems knowledge Plant Operations Testing / Dynamic response analysis Life-Cycle Management 6

Key Takeaway… ü Digital Is Different! • Engineering processes for “conventional” modifications do not,

Key Takeaway… ü Digital Is Different! • Engineering processes for “conventional” modifications do not, by themselves, provide an adequate defense against errors and events • Requires very different skills to implement correctly • Your design processes will be evaluated against this reality 7

Exelon Digital Modification Processes

Exelon Digital Modification Processes

Exelon Internal OPEX ü A series of events beginning in 2005 made it clear

Exelon Internal OPEX ü A series of events beginning in 2005 made it clear that improvement opportunities existed ü The Quad Cities Reactor Recirculation Adjustable Speed Drives (ASD) provides a representative example of the challenges • Approximately 150 Issue Reports • Manual scram, power reductions and operational challenges ü Principle findings from CCA: • • Latent design flaws in vendor products FMEA did not detect design issues Excessive reliance on vendors Testing failed to uncover issues ü Similar experiences with other modifications 9 9

Redesigning the process at Exelon ü Formed Corporate Capital Projects Group to oversee large,

Redesigning the process at Exelon ü Formed Corporate Capital Projects Group to oversee large, multi-site digital modifications (RRASD, DEH, MPT, TDFWP, BOP 7300…) ü Staffed with subject matter experts on digital technology ü CPG works closely with implementing engineers at the sites who manage the EC development process ü Advanced training provided to site and corporate digital I&C engineers to jump start performance ü Procedures and processes revised to capture best practices – process improvements will be continue indefinitely as practices continue to mature 10 10

Exelon Digital Modification Process ü The existing Configuration Control process is now supplemented with

Exelon Digital Modification Process ü The existing Configuration Control process is now supplemented with procedures that address the unique attributes of digital technology - Management Of Digital Modifications Digital Design Considerations Design Attributes For Digital Systems Software Development Digital Procurement Process Factory Acceptance Testing Cyber Security ü The process continues to evolve as Cyber Security requirements are implemented and additional best practices are identified 11 11

Typical Processes For Large Digital Projects

Typical Processes For Large Digital Projects

Typical Project Lifecycle 13 13

Typical Project Lifecycle 13 13

Procurement Specifications ü The act of fully defining detailed vendor requirements commensurate with project

Procurement Specifications ü The act of fully defining detailed vendor requirements commensurate with project safety significance, operational risk and project scope. ü Specifically identifying documentation and performance requirements for a given project including (but not limited to): - Verification and Validation (V&V) requirements Software Quality Assurance measures Hardware design requirements (including Single Point Vulnerabilities) Failure Modes and Effects Analysis (FMEA) requirements Software testing and validation requirements Cyber Security requirements Life Cycle Management (LCM) requirements ü Time invested in the development of a detailed procurement specification improves project execution by avoiding unbudgeted scope changes 14 14

The role of software design – A brief case study

The role of software design – A brief case study

Perfect Software Does Not Exist ü No system will ever be perfect no matter

Perfect Software Does Not Exist ü No system will ever be perfect no matter how rigorous the development process used or amount of money spent to develop and maintain it – humans develop software and humans will always make mistakes ü Highly automated systems effectively move the point of error from the user (Operations and Maintenance) to the programmer but human error still exists ü The Space Shuttle flight control system was arguably the most rigorously developed and tested control system ever conceived • 400, 000 words (very small footprint compared to a modern DCS) • $100, 000 per year in maintenance • Over the 25 year shuttle program, 16 Severity Level 1 software issues were identified – SL 1 issues are those that would result in the loss of the orbiter under the right conditions 16 16

How do software driven systems malfunction? ü Software malfunctions are systemic, not random ü

How do software driven systems malfunction? ü Software malfunctions are systemic, not random ü In the absence of hardware induced fault, instructions will execute exactly as written unerringly and without exception ü Software malfunctions require the simultaneous existence of two conditions: • An error must be present (often undetected) • An initiating event must occur ü If both conditions are not satisfied, no error will occur 17 17

A Representative Example From Aerospace ü The Event: • A completed commercial airliner is

A Representative Example From Aerospace ü The Event: • A completed commercial airliner is about to be delivered to the customer • A Factory Acceptance Test is being conducted by factory and customer personnel in which the parking brakes are applied and all four engines are taken to maximum continuous thrust • At this power setting and altitude (zero feet) the flight control system automatically selects “takeoff” mode as designed • The flight control system correctly recognizes that the wing surfaces are incorrectly configured for a takeoff and continuously sounds the Ground Proximity Warning (GPW) alarm as designed – this alarm is critical and cannot be silenced • A technician, irritated by the alarm and unable to silence it, trips the feed breaker for the GPW system knowing that this will de-energize the alarm • Ground proximity radar loses power and clears the zero altitude interlock • With the interlock cleared, control system now concludes the plane is in the air and releases the brakes – this is a programmed behavior to prevent landing the aircraft with the brakes set • Plane immediately accelerates (no passengers or luggage and little fuel) and strikes the jet blast barrier at full power 18 18

The results 19 19

The results 19 19

The results 20 20

The results 20 20

A Representative Example From Aerospace üSoftware malfunction requires two conditions: • The error must

A Representative Example From Aerospace üSoftware malfunction requires two conditions: • The error must be present and undetected: - This application software had been in service for years and “ground run-up” tests are somewhat routine • The initiating event must occur: - The loss of supply voltage to GPW interlock caused the brakes to release exactly as they were programmed to do. - The software development team never envisioned this combination of events 21 21

Software evolves during the development process Changes can invalidate previous testing or introduce new

Software evolves during the development process Changes can invalidate previous testing or introduce new errors 22 22

The Interfaces To Cyber Security

The Interfaces To Cyber Security

Integration With Cyber Security Requirements ü The Cyber Security Rule (10 CFR 73. 54)

Integration With Cyber Security Requirements ü The Cyber Security Rule (10 CFR 73. 54) is a license condition that applies to any digital component that is: • • • Safety Related Important To Safety (defined as reactivity impact) Physical Security Emergency Preparedness Systems that support any of the above Systems with pathways of connectivity to any of the above ü Significant synergies exist between the Digital I&C process and Cyber Security ü Consider the extent to which these processes are interconnected and aware of the other 24 24

Cyber / Digital Relationship 25 25

Cyber / Digital Relationship 25 25

Testing Considerations

Testing Considerations

Factory Acceptance Testing ü Many test plans focus on “positive testing” which confirms expected

Factory Acceptance Testing ü Many test plans focus on “positive testing” which confirms expected responses for a given set of inputs or stimulus conditions – informative but only to a point ü Negative testing focuses on verifying that you don’t get an unexpected response when you combine unusual stimulus or do something outside of normal operation – effectively its an attempt to trigger a malfunction which can be very informative ü It’s nearly inevitable that over the life of a system, it will be operated in a way the designers never anticipated. Take advantage of unstructured testing opportunities (i. e. pre-FAT) to attempt to “break” the system early in the development cycle while there is ample opportunity to take corrective action for issues identified ü Process needs to involve Operators, System Engineers and SME’s 27 27

Modification Acceptance Testing ü Most modification issues are not with the systems themselves but

Modification Acceptance Testing ü Most modification issues are not with the systems themselves but rather interfaces to installed plant hardware (power / hydraulics / supporting systems / actuators / protective devices / EMI / RFI…) ü The Mod Acceptance Test (MAT) is the very first time the system will be tested in the plant environment. In some cases it will be the first time that the system is connected to any physical components and therefore represent the first opportunity to identify and correct interface issues – care should be taken to exercise every interface to the extent possible and as early as possible ü All models are wrong – this includes your plant simulator and vendor simulation models therefore in-plant testing is critical and your most robust line of defense 28 28

Post Installation Configuration Control

Post Installation Configuration Control

Ongoing Configuration Control ü One of the advantages of digital systems is that they

Ongoing Configuration Control ü One of the advantages of digital systems is that they are easily modifiable – this also constitutes a vulnerability if not taken into consideration by the process • Processes need to exist to detect any inadvertent changes to a systems configuration • “Baseline / Compare” utilities can be used to compare system states with a known and approved baseline configuration • Periodic audits of log, system and event files • Surveillance testing • Defined protocols for testing of authorized modifications (i. e. regression testing) ü Not all changes are modifications • Changes to calibration constants controlled in accordance with maintenance procedures • Pre-evaluated adjustments (tuning within defined boundaries) • Specific changes for Cyber Security incident response in accordance with CS procedures ü Reference EPRI Topical Report 1022991 – “Guideline On Configuration Management For Digital Instrumentation And Control Equipment And Systems” 30 30

Questions? 31 31

Questions? 31 31