Transactions in NSO 4 4 April 2017 Overview

  • Slides: 20
Download presentation
Transactions in NSO 4. 4, April 2017

Transactions in NSO 4. 4, April 2017

Overview of the NSO Transaction Processing Operator Logs In Operator logs in and starts

Overview of the NSO Transaction Processing Operator Logs In Operator logs in and starts a session. The session is connected to a new (empty) transaction with the Running data store as the backend. Transaction <empty> Running x=3 y=false z=1. 2. 3. 4 A transaction is a change set relative something else, the backend. In the case here, the Running data store is the backend.

First Change Operator Logs In Operator change: x=15 Operator makes a change. The transaction

First Change Operator Logs In Operator change: x=15 Operator makes a change. The transaction is updated with the changed value. Transaction x=15 Running x=3 y=false z=1. 2. 3. 4 Within the transaction, reading x gives "15". Reading y gives "false". Reading z gives "1. 2. 3. 4". Other applications, reading from the Running data store, will see that x is "3".

Changing the Same Element Again Operator Logs In Operator change: x=17 Operator changes the

Changing the Same Element Again Operator Logs In Operator change: x=17 Operator changes the same element again. The transaction is updated with the changed value, x=15 is overwritten with x=17. Elements in a particular transaction can only have a single value. Transaction x=17 Running x=3 y=false z=1. 2. 3. 4 There is no concept of time inside a transaction, no before and after. A transaction is a set of changes, not a sequence.

More Changes Operator Logs In Operator change: x=17 Operator change: y=true z=1. 2. 3.

More Changes Operator Logs In Operator change: x=17 Operator change: y=true z=1. 2. 3. 4 Transaction x=17 y=true Running x=3 y=false z=1. 2. 3. 4 Operator changes some more elements. The transaction is updated with the additional changes. This is still the same, single transaction. Setting an element in the transaction to the same value as in the backend has no effect. That is no change.

Commit: This is the topic of this presentation Operator Logs In Operator change: x=17

Commit: This is the topic of this presentation Operator Logs In Operator change: x=17 Operator change: y=true z=1. 2. 3. 4 Transaction x=17 y=true Running x=3 y=false z=1. 2. 3. 4 Operator commit The transaction manager kicks off the commit sequence when the operator requests the transaction to be committed. What NSO does during the commit sequence is the main topic of this presentation.

Running Possible Outcomes: OK or Fail Operator Logs In Operator change: x=17 Operator change:

Running Possible Outcomes: OK or Fail Operator Logs In Operator change: x=17 Operator change: y=true z=1. 2. 3. 4 x=17 y=true z=1. 2. 3. 4 Operator sees "OK" Operator commit Operator sees "Fail" Before we dive into the commit sequence, we should understand there are two possible outcomes of a transaction: OK: The transaction was processed without error and all of it has been acted upon. Fail: The transaction failed due to an error, and no part of it has been acted upon. In principle it should not be possible to observe a failed transaction for an outside observer. Running x=3 y=false z=1. 2. 3. 4

A closer look at the Commit Sequence Lock Trans Hooks Validate Operator: x=17 y=true

A closer look at the Commit Sequence Lock Trans Hooks Validate Operator: x=17 y=true z=1. 2. 3. 4 Prepare Transaction x=17 y=true Commit Notify The main stages of a transaction (according to literature) is Prepare-Commit (for simple two-phase transactions) or Prepare-Commit-Confirm (for three-phase transactions). NSO extends these phases with a few more steps to lay the groundwork and deal with changes afterwards. Running x=3 y=false z=1. 2. 3. 4 Unlock

Transaction x=17 y=true Lock Running Lock Trans Hooks Validate Prepare Commit Locking, so that

Transaction x=17 y=true Lock Running Lock Trans Hooks Validate Prepare Commit Locking, so that no other changes are under way while we are processing this transaction, is key to a simple programming model. On the other hand, this constraint limits the maximum throughput (transactions per minute) of the system. Notify Running x=3 y=false z=1. 2. 3. 4 Unlock

Transaction x=17 y=true Service Create Lock Trans Hooks Validate RFS Prevalidation RFS Hook Cust.

Transaction x=17 y=true Service Create Lock Trans Hooks Validate RFS Prevalidation RFS Hook Cust. Hook/ Transform Prepare Undo #1 delete m Service #1 Create Service #2 Create Service #1 Service #2 m=3 p="auto" … Undo #2 Commit Notify Running x=3 y=false z=1. 2. 3. 4 Unlock The transaction manager computes the Undo information for each service instance, and injects it into the service data. delete p The RFS Hook captures the changes for each modified service instance in a separate transaction-in-transaction, with the operator's transaction as back end.

Service #2 Transaction x=17 y=true p="auto" <undo #2> Service #1 Validate m=3 <undo #1>

Service #2 Transaction x=17 y=true p="auto" <undo #2> Service #1 Validate m=3 <undo #1> Lock Trans Hooks Validate YANG Validation Prepare Custom Validation Commit Notify Validation has to happen after all transaction hooks have run. Transaction hooks can (and typically do) update the contents of the transaction, and we need to validate the final contents of the transaction, after all changes are done. Running x=3 y=false z=1. 2. 3. 4 Unlock

Service #2 Transaction x=17 y=true p="auto" <undo #2> Service #1 Prepare Phase m=3 <undo

Service #2 Transaction x=17 y=true p="auto" <undo #2> Service #1 Prepare Phase m=3 <undo #1> Lock CDB writes down all its changes to the disk journal, except the final transaction complete mark. Send all CLI commands to the device. This will validate and activate the new configuration. Did it work? Trans Hooks Validate Prepare NED Manager CDB No-trans NED CLI Device Commit NETCONF Device x=3 y=false z=1. 2. 3. 4 Unlock If HA is enabled, the same journal records that are written to disk are sent to standby nodes. Custom Database Transaction NED Notify Running … Send all NETCONF commands to the device's candidate store and validate the new configuration. Did it work?

Point of No Return Commit or Abort? Lock Trans Hooks Validate Prepare The moment

Point of No Return Commit or Abort? Lock Trans Hooks Validate Prepare The moment of truth: If any transaction participant returns failure, take the abort path. Otherwise proceed with Commit. This is the point of no return. This is when we decide if the transaction went through or not. We will not (cannot) change our minds after this. This is defined by standard transaction theory. Commit Abort Transaction x=17 y=true Notify Running x=3 y=false z=1. 2. 3. 4 Unlock

Commit Phase Lock CDB writes the final transaction complete mark to the disk journal.

Commit Phase Lock CDB writes the final transaction complete mark to the disk journal. All CLI commands already sent to the device and activated. So nothing to do. Trans Hooks The transaction manager updates running and creates rollback file Validate CDB No-trans NED CLI Device Prepare NED Manager Transaction NED NETCONF Device Rollback #4711 x=3 y=false Commit Custom Database … Notify Running x=17, y=true, z=1. 2. 3. 4 p="auto", m=3 <undo#1>, <undo#2> Unlock If HA is enabled, the transaction complete mark is also sent to standby nodes. Send commit command to the device. This will activate the new configuration.

Running Notify Lock x=17, y=true, z=1. 2. 3. 4 p="auto", m=3 <undo#1>, <undo#2> Trans

Running Notify Lock x=17, y=true, z=1. 2. 3. 4 p="auto", m=3 <undo#1>, <undo#2> Trans Hooks Validate Prepare Kickers and subscribers are informed about the changes (they care about) in the transaction. Kicker Manager Kickers and subscribers cannot reverse the transaction (we are past the point of no return). Kicker client #1 Kicker client #2 Commit … Notify Unlock Custom Subscribers Subscriber seq=50 Subscriber seq=75 …

Running Unlock Running Lock Trans Hooks Validate x=17, y=true, z=1. 2. 3. 4 p="auto",

Running Unlock Running Lock Trans Hooks Validate x=17, y=true, z=1. 2. 3. 4 p="auto", m=3 <undo#1>, <undo#2> Prepare Commit Notify After unlock, the next transaction in line can start processing. At any one time, only one transaction can be within the lock region (to keep the programming model sane). It is therefore very important that the time spent there is as short as possible. The lion's share of the time spent is almost always in waiting for devices to accept the configuration change and report back. For some devices with complicated CLIs, the time to compute the right sequence of CLIs to send may also be significant. If those things could be done outside the lock, the throughput could increase manifold. This is indeed possible. How commit queues accomplish this is explained later in this presentation. Unlock

Service #2 Commit Queue: Prepare Lock Trans Hooks Validate When commit queues are enabled,

Service #2 Commit Queue: Prepare Lock Trans Hooks Validate When commit queues are enabled, the NED Manager prepares a queue item for each involved device. Each queue item consists of the device specific change set and a snapshot handle to enable reading from Running as it looks at the time of queueing. p="auto" <undo #2> Service #1 CLI Device x=3 y=false z=1. 2. 3. 4 m=3 <undo #1> Prepare Commit Notify Unlock Many NEDs need to read more than the changed data in order to generate the appropriate commands on the device. NED Manager Q Item #4711 : 1 Running Transaction x=17 y=true Q Item #4711 : 2 NETCONF Device Dev. Changes Snapshot p="auto" This version of Running The snapshot database tracks what's on the device now. Running reflects the all the committed data, including later queue items.

Commit Queue: Commit Lock Trans Hooks Validate Running x=17, y=true, z=1. 2. 3. 4

Commit Queue: Commit Lock Trans Hooks Validate Running x=17, y=true, z=1. 2. 3. 4 p="auto", m=3 <undo#1>, <undo#2> Prepare The NED manager places the queue items on the device queues. Once queue items are placed on the device queues, they are being sent to devices regardless of new transactions being committed, aborted, etc. Each queue item is handed to the respective NED for delivery to the device. Commit Notify Unlock NED Manager #4711 CLI Device #4711 NETCONF Device Dev. Changes Snapshot p="auto" This version of Running

Commit Queue: Execute Queue items are being processed asynchronously to other activities in the

Commit Queue: Execute Queue items are being processed asynchronously to other activities in the system. Commit queues will not give transactional integrity, but change sets coming from the same transaction are sent out at roughly the same time to all participating devices. Execute If a device fails to accept the change, that means the device will be marked as out of sync, and no further processing of that device queue will take place until situation cleared. #4724 #4716 #4711 NED CLI Device NETCONF Device Dev. Changes Snapshot p="auto" This version of Running

Rainy day scenario: Abort Lock Trans Hooks Uh-oh, we've already activated the new configuration,

Rainy day scenario: Abort Lock Trans Hooks Uh-oh, we've already activated the new configuration, so we need to revert from it. Service disruption may already have occurred. Inverse Transaction The NED can x=3 ask NSO for y=false delete m, p an inverse transaction, or a set of inverse CLI commands. Validate This is why we like transactional devices and protocols so much. No-trans NED CLI Device Service #2 Transaction x=17 y=true p="auto" <undo #2> Service #1 m=3 <undo #1> Prepare Abort NED Manager Transaction NED NETCONF Device Notify Running x=3 y=false z=1. 2. 3. 4 Unlock Custom Database … Send abort command to the device. This will discard the new configuration. It was never live, so zero disruption to services.