Cisco NSO Developer Days NSO in Brown Field
Cisco NSO Developer Days NSO in Brown Field Deployments Tomas Mellgren, Director Engineering
Introduction • An NSO deployment project almost always starts in a brown field environment with existing systems or processes for configuring devices in the network. • This isn't wrong or unusual but its important to understand the current situation and have a vision for where you want to end up. • Brown field environments with out of band changes is one of the things that, if not approached correctly, can have big effects on performance and behavior of the entire solution.
NSO and Out of Band Changes
NSO • a) NSO service instance data d NSO b Moving parts: • Provisioned services • The data is based on the service YANG model. • b) NSO device configuration • Representation of the configuration of all devices according to the NED YANG models. • c) Device configuration Device • Configuration on the actual network device • d) Service mapping logic c • Java/Python code and XML templates mapping service input parameters to device configuration
What is an Out Of Band Change • An out of band change is a human or system a NSO b making changes directly to the device without going through NSO. • NSO operates in three different type of environments with respect to out of band changes. • The toolset used in the different environments is different. • It is important to identify what environment NSO is Device used in, and design the system with that in mind. • It is a common scenario that a customer starts in c one environment with the explicit goal of using NSO to gradually move from one environment to another, to achieve full automation.
Out of Band Change Environments Its important to identify what environment you are in, and what environment you want to be in. 1. No Out-Of-Band changes allowed 2. Some Out-Of-Band changes, but always unrelated to NSO service 3. A lot of OOB changes that are out of control (or is part of transition phase) that affects read and/or write operations from NSO. Are you automating manual tasks or designing for automation from the ground up?
Automation Journey Low automation Maturity Medium Automation Maturity High Automation Maturity • Mainly device level operations • A lot of manual processes. • Uncontrolled configuration of devices by multiple systems and humans. A lot of out of band changes. • Lack of central authority for resource allocation • Automated service lifecycle and insight • Some manual processes, mainly for break fix or operational procedures. • Some of out band changes • Fully automated service lifecycle and operational procedures • All write operations towards the network is under full control • NSO has a number of tools and features for working in environments of different automation maturity and mitigate the limiting effect of for example out of band changes. • These tools and features can also be used to help transition the deployment towards the right. • The use of these features will in most cases come with a price in performance and/or complexity.
Features for Brown Field Deployments • Historically the only option to get NSO device configuration (b) in sync with the device configuration (c) has been the sync-from operation. • Features for a brown field deployment is a focus area and a lot of features have been added to help with different challenges: • no-overwrite • partial-sync-from • no-out-of-sync-check • etc. • The following slides will show some of the most important features to use in deployments with out of band changes.
The Problems with sync-from • Not recommended, ever, as an operation that runs continuously. Use sync-from as a bootstrap operation or under strict control. a NSO read b • Sync-from can be dangerous: • Completely overwrites the NSO device configuration (b) without guard rails. Read entire device configuration • Potentially overwrites device configuration from existing service instances. • After sync-from it becomes harder to see what actually Device c has changed out of band in the network, since you no longer have the possibility to diff between NSO and the network. • Sync-from can be slow: • Can be a very costly operation for devices with large configuration. • Takes a device lock so will prevent service instantiations towards that device.
No Out of Band Changes
No OOB Changes • NSO is the single point of configuration authority a for a given device • A device being out of sync is an exception and NSO b considered an alarming event that will need operator intervention • In this environment, the "sync-from" operation is only used as a bootstrap operation, needed once when you onboard a new device. Device c • This environment provides the highest automation maturity and the highest network configuration authenticity.
Controlled Out of Band Changes
Controlled OOB Changes • Configuration being provisioned through NSO writes (or reads) certain parts of the device configuration (e. g. MPLS VPN). • Another system or human is configuring the same device but writes to different parts of the configuration • The configuration changes performed out of band are known and will not change configuration that is written or read by NSO. a NSO b read write Device c
Controlled OOB Changes in NSO • NSO being out of sync with the device is accepted, since out of band changes are safe. • NSO is configured to skip the sync check during service provisioning. • The advantage is that there is no performance penalty (in fact this is the fastest way, since the sync check itself is rather expensive on some devices) • Potentially no-overwrite can be used as an additional safety check (see next section). a NSO b read write no sync-check Device c
Uncontrolled Out of Band Changes - Write
Uncontrolled OOB Changes - Write • Configuration being provisioned through NSO writes certain parts of the device configuration (e. g. MPLS VPN). a NSO • Other systems and humans are performing b unknown out of band changes to the same device. • Sometimes they will be service WRITE impacting, read write Device c i. e. overwrite configuration that an NSO service wants to write or has written. • In almost all cases uncontrolled out of band changes are considered an operational failure/incident and the tools in NSO are used to identify and understand the root cause
Uncontrolled OOB Changes – Write in NSO – no overwrite • A more granular approach to sync-check is the "no- a overwrite" feature. • It tells NSO not to write changes to a device if NSO b someone else has changed the configuration NSO is about to write • NSO will only raise an error if anything in that part no-overwrite check write of the configuration has changed. • This will reduce the number of failed services Device c provisioned through NSO due to out of band changes that isn’t service impacting. • The performance impact is typically small but depends on the capabilities of the device, similar to partial sync.
Uncontrolled OOB Changes – Write in NSO– Error Handling • If an error is raised, either using default sync check a NSO b or with no-overwrite either a human or code needs to analyze what has happened, take a decision and perform some kind of error handling. • The decision can only be made by someone/something with full understanding of the use case and the engineering/ops policies in place. ERROR! Device c • Operations like service check-sync and compare- config are typically used identify what has been changed, determine affected services and decide on mitigation actions.
Uncontrolled Out of Band Changes - Read
Uncontrolled OOB Changes - Read • Configuration being provisioned through NSO writes certain parts of the device configuration (e. g. MPLS VPN). a NSO read b write • In the NSO service code other parts of the NSO device configuration (b) is read and used for determining the service output configuration or performing some kind of validation • Other systems and humans are performing Device unknown out of band changes to the same device. • Sometimes they will be service READ impacting, c i. e. overwrite device configuration (c) that an NSO service reads. This causes the NSO service to base its logic on inaccurate information.
Uncontrolled OOB Changes – Read in NSO – partial-sync • A more granular approach to sync-from is the partial-sync feature a NSO read b • Instead of syncing/reading the entire configuration NSO can read parts of the configuration supplied as parameters. • Partial sync can provide a significant performance partial-sync Device c improvement compared to sync-from. But the improvement is dependent on the capabilities of the device, and the structure of the configuration.
Uncontrolled OOB Changes – Read in NSO – live-status • In certain scenarios it can be beneficial to use live-status calls instead of partial-sync. a • The NSO device configuration (b) that impacts the reads will NSO remain out of sync b • live-status reads can be done directly from the service code Live-status read or as a distinct step before the service provisioning. The information received from the device will be used only in the service logic context. • This can be a useful approach if: Device • The values read frequently changes • There is no need from any other service for this data to c be in sync • It is for some reason costly to sync the data, i. e. efficiency of partial-sync is limited • It is very efficient to retrieve the data using a show command directly from the device
Out of Band Changes for Existing Services
Out of Band Changes affecting Existing Services • The examples above shows how to handle out of band changes during provision time. • NSO offers a number of tools to ensure consistency and identify out of band changes for existing services. • • • device check-sync compare-config service check-sync service deep-check-sync service re-deploy dry-run
What does it all mean?
Conclusions • Sync-from. Use it with care! • • Do the analysis on why sync-from is needed. If its currently used continuously it will always limit your possibilities for full automation. Do an analysis where the out of band changes are coming from. Should they exist at all • Can/should they be automated instead • • When looking at performance of an NSO deployment you need to look at the entire solution stack.
Backup
- Slides: 28