Warwick Rudd A Beginners Guide to HADR Warwick
Warwick Rudd A Beginners Guide to HADR
Warwick Rudd v v v v v Blogger on Simple. Talk Speaker at Tech. Ed Australia Speaker at Ignite New Zealand Speaker at Difinity New Zealand Speaker at SQL Saturday events Speaker at Local User Groups Host DBA Fundamentals Down. Under VC Host DBA Down. Under VC @Warwick_Rudd Warwick@sqlmastersconsulting. com. au www. sqlmastersconsulting. com. au
Master Class Training SQL Server Always On: The Senior DBA’s Field Guide Edwin Sarmiento Data Platform MVP Microsoft Certified Master Where - Microsoft Brisbane When – February 26 -28, 2018 In Person – AU$1, 557. 00 + GST Online – AU$1, 426. 00 + GST Bonus US$1000. 00 material included 2 for 1 Offer https: //www. eventbrite. com. au/e/sql-server-always-on-the-senior-dbas-field-guide-tickets-40088531878
High Availability and Disaster Recovery (HA/DR) planning is not just about the technology. Abstract When thinking or talking about High Availability/Disaster Recovery, many people jump straight to a particular technology without understanding other factors impacting the solution, and then waste time working backwards to understand requirements. Recovery Objectives, SLA’s and budget are some of the commonly overlooked factors when planning and developing a HA/DR solution. In this session we will walk through the 6 building blocks I take into account when developing a HA/DR solution. You can then use this same methodology to implement High Availability &/or Disaster Recovery in your own environment, working forward to determine the right technology.
Goals By the end of this session you will understand the 6 building blocks to use in designing and implementing a HA/DR solution for your SQL Server environment. Not picking a technology first.
High Availability &recovery Disaster Recovery
Accidental DBA’s? DBA’s ? DB Developers? BI Developers? Team Leaders / Managers? Project Managers? Randoms?
What is High Availability ? “A system design approach associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period” “The principal goal of a high availability solution is to minimize or mitigate the impact of downtime. A sound strategy for this optimally balances business processes and Service Level Agreements (SLAs) with technical capabilities and infrastructure costs” Microsoft - https: //technet. microsoft. com/en-us/library/jj 715263. aspx
What is Disaster Recovery ? “A system and set of processes that allow returning a system to a state of normality after the occurrence of a disastrous event” “The principal goal of a disaster recovery solution is to resume as close to normal activity in a pre-defined amount of time as outlined by Service Level Agreements (SLAs)”
Determining a HA/DR Solution 9’s Downtime Recovery Point Objective (RPO) Recovery Time Objective (RTO) Recovery Level Objective (RLO) Technology SLA’s $$$ SKILL
What is Down Time ? Un-Scheduled Down Time v v v DB Maintenance Failovers Patching System Configurations Upgrades Consolidations v v v Failovers Hardware Failures Loss of Network Connectivity Power Outages Database corruption
Down Time Nines Availability % Downtime / Day HH: MM: SS Downtime / Month HH: MM: SS Downtime / Year HH: MM: SS 90 02: 24: 00 73: 02: 55 876: 34: 55 99 00: 14: 24 07: 18: 17 87: 39: 30 99. 9 00: 01: 26 00: 43: 50 08: 45: 57 99. 99 00: 09 00: 04: 23 00: 52: 36 99. 999 00: 01 00: 26 00: 05: 16 99. 9999 00: 00. 1 00: 03 00: 32
Recovery Point Objective “Is the maximum targeted period in which data might be lost from an IT service due to a major incident” Wikipedia - https: //en. wikipedia. org/wiki/Recovery_point_objective “In Database terms how many minutes or hours worth of data loss is acceptable for the application database being considered”
Recovery Time Objective “Is the targeted duration of time and a service level which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences” Wikipedia - https: //en. wikipedia. org/wiki/Recovery_time_objective “In Database terms how many minutes or hours do we have to restore an application database system to normal functional operations”
Recovery Level Objective “Is the targeted granularity of a system to recover in the event of a disaster (or disruption) in order to allow the system to continue normal functional operations” “In Database terms do we need to consider things at the Instance Level, Database level or table level? ”
Technology Capabilities Technology Potential Data Loss (RPO) Potential Recovery Time (RTO) Automatic Failover Readable Secondaries Availability Groups (Synchronous Commit) Zero Seconds Yes 0 -3 Availability Groups (Asynchronous Commit) Seconds Minutes No 0 -8 Failover Clustered Instances NA Seconds – Minutes Yes NA Database Mirroring (High Safety) Zero Seconds Yes NA Database Mirroring (High Performance) Seconds Minutes No NA Transaction Log Shipping Minutes – Hours No Not during a restore Virtual Machines NA Seconds – Minutes Yes NA Azure Site Recovery NA Minutes - Hours No NA Azure Backup NA Minutes - Hours No NA Stor. Simple NA Minutes - Hours No NA Cool Storage NA Minutes – Hours No NA
Service Level Agreements “A contract outlining the obligations to be met for predefined scenarios” Examples: v Backup retention periods v Uptime / Downtime requirements v Recovery time requirements in the event of issue v Data corruption recovery times v HA failover times v DR failover times
Total Cost of Ownership (TCO) v Hosting Costs v On-premises v Cloud v Hybrid v Hardware Costs v Servers v SANS v Networking v Software Costs v Application v Database v Monitoring v Operational / Support Costs v Vendors v Business hours v After hours v Staff v Business hours v After hours v Training v Holidays v Seek leave v Consultants
SQL Server Database Engine already) (Pretty well covered SQL Server Analysis Services v Supported in a Clustered Environment Components SQL Server Reporting Services v Scale-Out Deployment v Not Support FCI Environment v Supported in AGs v NLB requirement SQL Server Integration Services v Not Support FCI Environment v Supported in AGs
Versions & Editions SQL Server Versions: v 2005 v 2008 R 2 v 2014 v 2016 v VNext SQL Server Editions: v Standard v BI v Enterprise v Datacenter Supported features by Version and Edition impacts the end design. Required features impacts the Version and Edition required in the design.
Mitigations Ø Site level loss v Natural Disaster (Fire, Flood, Earthquake) v Intentional attack Ø Hardware level failure v Server failure v Disk failure (SAN, local disk) v Memory failure v Switch failure v Controller failure Ø Network level failure Ø Power failure Ø Data loss v Malicious attack v Accidental human error Ø Data Corruption
Capacity & Performance “Availability and Recoverability are not the only concerns” Things to take into consideration: v Distance between sites v Bandwidth between sites v Location of Application Servers to SQL Servers v Hardware Specifications v Network Configurations v Storage Configurations v Ability to expand or upgrade
Processes & Documentation “When something goes wrong how do you respond? ” Things to take into consideration: v Failover process v Who is involved and what do they need to do? v What triggers a response? v Patching v Failover Testing v DR Testing v Data Recovery Testing
Recovery Models: v Full v Simple v Bulk-Logged Backup & Recovery Requirements: v RPO v RTO HA/DR features will dictate the recovery models required. HA/DR features do no solely provide protection for your data. A Backup and Recovery strategy is required to provide a complete solution. Ties back to SLA’s mentioned earlier. Should be tested and timed on a regular basis.
Things to Monitor: Monitoring v Server Up/Down v Instance Up/Down v Agent Up/Down v Database Up/Down v SQL Error Log v Windows Event Log v Server Health v CPU Utilisation v Memory Utilisation v Disk Capacity v Agent Job failures v Failover Events v WSFC Health v AG Health v DB Mirroring Health v Log Shipping Health v Data Transfer Rates v Redo Rates v Latency behind v Log transfer rates
Patching v Server Level v Service Packs v Cumulative Updates v Hot Fixes v Security patches v Rolling v HA Technology v Regular v Proactive Schedule
We looked at the definitions of High Availability and Disaster Recovery Summary We looked at the 6 Building blocks of High Availability and Disaster Recovery: Down Time Recovery Objectives Technology Service Level Agreements Total Cost of Ownership Skill
Thank You Questions?
Thanks to all our Sponsors
- Slides: 30