Intelli Magic Vision for SAN DFW LunchnLearn Series

  • Slides: 29
Download presentation
Intelli. Magic Vision for SAN DFW Lunch-n-Learn Series Brett Allison, Director of Technical Services

Intelli. Magic Vision for SAN DFW Lunch-n-Learn Series Brett Allison, Director of Technical Services Brian Howard, VP of Sales

Agenda • Improving the Storage Infrastructure: ‒ Problem Process ‒ Proactive Process ‒ Planning

Agenda • Improving the Storage Infrastructure: ‒ Problem Process ‒ Proactive Process ‒ Planning Process

Improving the Problem Process

Improving the Problem Process

Recent Customer Example - Background • • Frequent performance problems IT had been outsourced

Recent Customer Example - Background • • Frequent performance problems IT had been outsourced Client owns the infrastructure An outsourcer manages the infrastructure Most of the resources reside off-shore, lack of deep performance skills off-shore and on-shore resources are over-subscribed Lack of visibility with current tools resulting in weeks of firefighting Client / outsourcer relationship had a certain amount of friction

Customer Problem - Summary • Specific instance of a problem is with a server

Customer Problem - Summary • Specific instance of a problem is with a server named topaxdb 2 p 051 • Summary of Findings: ‒ IBM Spectrum Virtualize Global Mirror used for Remote Replication and Data Recovery ‒ Restoration of database associated with topaxdb 2 p 051 led to opaxdb 2 p 051 significant write increases and write performance degradation for topaxdb 2 p 051 and likely many other systems within the shared topaxdb 2 p 051 ecosystem

Intelli. Magic Visibility Approach • • • Install lightweight collector to gather necessary storage

Intelli. Magic Visibility Approach • • • Install lightweight collector to gather necessary storage systems to monitor replication to monitor fabric and key storage systems Send data to Intelli. Magic Saa. S environment Analyze the data and provide context and technology centered results using White Box Artificial Intelligence

7 Compatible Machine Assistance Approaches Black-box Analysis White-box Analysis Typical for most statistical approaches

7 Compatible Machine Assistance Approaches Black-box Analysis White-box Analysis Typical for most statistical approaches aka Availability Intelligence Reactive • Platform-agnostic • Quick, relative correlations only • Focused on problem symptom metrics, not truly predictive • Has the workload changed? Pro-active • Platform-specific interpretation • Algorithms access expert knowledge • Focused on root causes for predictive and prescriptive insights • Can each subcomponent handle work?

8 Front-End Write Response [rating: 0. 28] For Serial 'TOP_SVC_03_GM' by Storage Pool Rating

8 Front-End Write Response [rating: 0. 28] For Serial 'TOP_SVC_03_GM' by Storage Pool Rating based on DSS Storage Pool data using DSS Thresholds Primary Site Write Latency to all storage pools with replicated volumes increased significantly between 9: 15 AM and 5: 00 PM on 11/27/2018

9 Response Time for Replication Writes [rating: 2. 92] For Serial 'TOP_SVC_03_GM' Rating based

9 Response Time for Replication Writes [rating: 2. 92] For Serial 'TOP_SVC_03_GM' Rating based on DSS Storage Pool data using DSS Thresholds Secondary Site Write Latency to all storage pools with replicated volumes increased significantly between 9: 15 AM and 5: 00 PM on 11/27/2018

10 Replication Writes for Spectrum Virtualize For Serial 'TOP_SVC_03_GM' The number of Replicated Write

10 Replication Writes for Spectrum Virtualize For Serial 'TOP_SVC_03_GM' The number of Replicated Write tracks (64 KB) increased significantly during the problem period.

11 Replication Send [rating: 0. 00] For Serial 'TOP_SVC_03_GM' Rating based on DSS Links

11 Replication Send [rating: 0. 00] For Serial 'TOP_SVC_03_GM' Rating based on DSS Links data using DSS Thresholds The send MB/sec increased from and average of around 180 MB/sec to 360 MB/sec during this peak period.

12 Top 10 Replication Writes Tracks For Serial 'TOP_SVC_03_GM' by Volume Label The majority

12 Top 10 Replication Writes Tracks For Serial 'TOP_SVC_03_GM' by Volume Label The majority of the increase was related to writes tracks to SO_purescalecluster_appp_00* LUNs

13 Top 10 Replication Write Response Time For Serial 'TOP_SVC_03_GM' by Volume Label The

13 Top 10 Replication Write Response Time For Serial 'TOP_SVC_03_GM' by Volume Label The secondary write latency increased from <100 ms to > 200 ms for SO_purescalecluster_appp_00* LUNs

14 Port to Remote Node Response Time [rating: 0. 17] by Serial Rating based

14 Port to Remote Node Response Time [rating: 0. 17] by Serial Rating based on Host Adapters data using DSS Thresholds The average increased latency from primary site TOP_SVC_03_GM to secondary site is easy to spot in this chart over last 7 days. This is a good way to monitor if the condition is occurring.

15 Zero Buffer to Buffer Credits [rating: 2. 86] For Switch WWN 'topfddcxp 003'

15 Zero Buffer to Buffer Credits [rating: 2. 86] For Switch WWN 'topfddcxp 003' Rating based on Switch Ports data using Switch and Port Thresholds When global mirror is forcing synchronous writes over a congested link with high latency, the writes on the primary site consume valuable buffer credits. This can impact users without any replication as buffer credits are limited shared resources on the switches. This chart shows this increase in buffer credit shortages.

Summary of Findings • • • Host write response time is very poor during

Summary of Findings • • • Host write response time is very poor during peak periods Restores are happening to systems with replicated volumes resulting in unnecessary traffic All symptoms point to bandwidth constrained replication environment

Intelli. Magic Recommendations ü Two ways to resolve this issue within current technology: •

Intelli. Magic Recommendations ü Two ways to resolve this issue within current technology: • Add additional bandwidth • Add additional storage capacity at the primary site and configure Global Mirror with Change Volume ü Best Practice: • Coordinate large restorations to ensure volumes are not replicated during restoration processes

Intelli. Magic Recommendations - Continued ü Implement processes and tools with the goal of

Intelli. Magic Recommendations - Continued ü Implement processes and tools with the goal of improving the predictability of the performance of the environment: • Implement Intelli. Magic Vision to provide deep visibility into all facets of the SAN infrastructure • Provide training for staff and customized dashboards to provide quick root cause analysis and understanding of what to do • Provide alerting when bandwidth requirements exceed available bandwidth • Provide investigation processes to quickly identify hosts/applications causing issues and remediation steps. • Provide ongoing performance analysis services

Improving the Proactive Process

Improving the Proactive Process

Proactive Best Practice #1 Daily Review of Vendor Specific Storage Array Key Performance Indicators

Proactive Best Practice #1 Daily Review of Vendor Specific Storage Array Key Performance Indicators

Proactive Best Practice #2 Daily Review of SAN Fabric Health

Proactive Best Practice #2 Daily Review of SAN Fabric Health

Proactive Best Practice #3 Daily Review of Host I/O Workload

Proactive Best Practice #3 Daily Review of Host I/O Workload

Proactive Best Practice #4 Daily Review of Key Capacity Indicators

Proactive Best Practice #4 Daily Review of Key Capacity Indicators

Proactive Best Practice #5 Configuration: Audit SAN Zoning Health

Proactive Best Practice #5 Configuration: Audit SAN Zoning Health

Improving the Planning Process

Improving the Planning Process

Plan Your Capacity Growth Quarterly: Plan for storage capacity

Plan Your Capacity Growth Quarterly: Plan for storage capacity

Capacity Forecast Over Time Quarterly: Plan for storage capacity

Capacity Forecast Over Time Quarterly: Plan for storage capacity

Intelli. Magic Storage Infrastructure Visibility Improves Your: Problem Process: Proactive Process: Planning Process:

Intelli. Magic Storage Infrastructure Visibility Improves Your: Problem Process: Proactive Process: Planning Process:

Thank You for Coming! • We will now do the drawing • We’re local,

Thank You for Coming! • We will now do the drawing • We’re local, ready to help! • Amy Quick will call you seeing about setting up discovery meetings www. intellimagic. com