High Availability Disaster Recovery Techniques in MS SQL

High Availability & Disaster Recovery Techniques in MS SQL Server { Muhammad Tariq 1 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

q HA and DR q Downtime Types q Planning an HA strategy q SQL Server HA Technologies q Planning a DR strategy q SQL Server DR Methods Contents 2 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Ø Ø refers to a system or component that is continuously operational for a desirably long length of time. is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. refers to the availability of resources in a computer system, in the wake of component failures in the system. High Availability means 3 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

There are two parts to describe: Ø Disaster: meeting a natural or man-made tragedy Ø Recovery: Healing or Cure, the process of recovering from illness. Disaster Recovery means 4 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Ø Planned Downtime Un-planned Downtime Types 5 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Performing Database Maintenance Ø Ø Performing Batch Operations Ø Ø Ø Data Archrivals Summary Processes Performing an upgrade Ø Ø 6 Database Files Migrations Indexes Maintenance (Creating, Dropping, Rebuilding both Clustered and Non-Clustered Indexes) Minimize through Online Index Operations Installing SQL Server SPs and CUs Installing Windows or MS updates Updating Drivers and firmware updates Minimize through Rolling Upgrades Downtime Types - Planned Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Data Center Failure Ø Ø Server Failure Ø Ø Drive failure, a RAID controller failure, I/O subsystem software bug causing corruption Network Failure Ø Ø Failed power supply, Failed CPU, Failed Memory, Operating System Crashes I/O subsystem Failure Ø Ø Natural Disaster, Fire, Power loss, Failed network Connectivity Network connectivity failure Human Error Ø Dropping a table, deleting or updating data without specifying predicate, setting a database offline, Shutting down SQL Server instance. Downtime Types – Un-planned 7 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Requirements Ø Recovery Point Objectives (RPO) Ø Ø Recovery Time Objectives (RTO) Ø Ø The maximum allowable data loss when a failure occurs The maximum allowable downtime when a failure occurs Context for SLA requirements Ø When specifying that a database must be available 99. 99% of the time, is that 99. 99% 24 X 7 or is there an allowable maintenance window? Planning an HA strategy - Requirements 8 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Availability % DT/Year DT/Month DT/Week DT/Day 99% 36. 5 D 72 H 16. 8 H 2. 4 H 99% 3. 65 D 7. 2 H 1. 68 H 14. 4 M 99. 9% 8. 76 H 43. 8 M 10. 1 M 1. 44 M 99. 99% 52. 56 M 4. 38 M 1. 01 M 8. 66 S 99. 999% 5. 26 M 25. 9 S 6. 05 S 864. 3 MS Planning an HA strategy – Allowable downtime x=(n-y) * 100/n 9 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Ø Ø Ø Backup and Restore Methods Component Redundancy Replication Log Shipping Database Mirroring Basic Availability Group Always. On Availability Group Windows Failover Cluster HA Technologies 10 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Ø Recovery Models – Full, Bulk-Logged, Simple Backup Strategy Ø Ø Full backups, Differential Backups and Log Backups Differential Backups are very important but often neglected. Backup Compression, Backup Checksums, Mirrored backups, Stripped backups Recovery Strategy Ø Ø Actually test restoring your backups and have a plan for how you will do it. This is often ignored. Instant File initialization and backup compression can reduce restore times. Keeping VLF counts under control reduces recovery time portion of a restore Backup & Restore Methods 11 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø It is very common to not regularly restore database backups Ø Ø Ø It is also quite common for people not to run DBCC CHECKDB Ø Ø People take regular backups, but very rarely (or never) actually restore them Then, they find out in an emergency that their database backups are no good They are concerned about the resource usage on their production server(s) Consider using a “Restore Server” to restore your database backups Ø Ø You can restore each database and then run DBCC CHECKDB on it This can easily be automated. You can use an older server or new desktop machine Backup & Restore Methods – Using a Secondary server to Restore 12 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø It is important to have redundant components for a database server Ø Ø This helps avoid ever having to use your HA/DR technology You want to eliminate single points of failure where possible Ø Ø Ø Multiple power supplies plugged into separate circuits Multiple network ports, plugged into separate network switches Appropriate RAID protection for all of your logical drives Hot-swappable components can help avoid down time Having some cold spares available is also a good idea Component Redundancy 13 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø All Microsoft HA/DR technologies have some failover duration Ø Ø It is much better to avoid some unplanned failovers with redundancy Ø Ø Ø Traditional FCI must move cluster resources and start SQL Server on the new node Availability groups and DBM require database property changes Log shipping requires a manual failover Component redundancy can help avoid unplanned failovers from hardware failures This improves your overall uptime statistics Take advantage of every possibility to make your server more robust Ø Ø Ø The extra hardware cost involved is usually relatively small Be ready for resistance for financial reasons Keep in mind that this is a database server, not a web server Component Redundancy vs HA/DR 14 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Replication is a broad set of technologies that enable data to be copied and distributed between servers and then synchronized to maintain consistency Ø Ø Source database is a Publisher, destination is a Subscriber Ø Ø You can replicate the entire database or just a portion of it Log reader agent picks up all write activity from Publisher database This adds some read I/O workload to the log file Replication changes are temporarily stored in a Distribution database You can have multiple subscribers in multiple locations Ø You can additional indexes to subscriber databases for reporting Replication 15 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Replication 16 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Provides database-level protection Ø Ø Ø Can have multiple copies in multiple locations Databases must be in FULL recovery model at all times Requires a manual failover (although you can write code to partially automate) Some data loss is possible (since last log backup that was copied over) Log shipping is most commonly used for DR purposes Ø Ø Ø Can be used to protect against user error when you have a delayed restore Can be combined with most other HA technologies Does not add any extra performance overhead to primary database Log Shipping 17 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Log Shipping 18 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Database-level high availability, deprecated in SQL Server 2012 Ø Ø Principal database and mirror database, on separate instances Ø Ø Ø Only user databases can be mirrored Databases must be in Full recovery model Synchronous and asynchronous modes Ø Ø Ø Still works in SQL Server 2016, still a good solution for many scenarios Must use synchronous mode with a witness for automatic failover Asynchronous mode is only allowed in Enterprise Edition Database mirroring offers very fast, automatic failover Ø Only a single database, only one mirror Database Mirroring 19 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Database Mirroring 20 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø New feature in SQL Server 2016 Standard Edition Ø Ø Ø Basic AG enables a primary database to maintain a single replica. This replica can use either synchronous or asynchronous commit mode Asynchronous commit mode is a big advantage/improvement! Basic Availability Group Limitations Ø Ø Ø Limit of two replicas (primary and secondary No read access on secondary replica No backups on secondary replica Only one database can be in a basic availability group BAG cannot be upgraded to a regular Always. On AG Basic availability groups are only supported on Standard Edition Basic Availability Groups (BAG) 21 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Availability group contains one or more user databases that failover together Ø Ø Ø Requires Windows failover cluster instance, but not shared storage Enterprise Edition-only feature, until SQL Server 2016 Availability database is a database that belongs in an AG Ø Ø Primary database is the read-write copy (limit 1) Secondary database is the read-only or non-readable copy Ø Ø Up to four on SQL Server 2012 and eight on SQL Server 2014 Databases must be in Full recovery model at all times Offers relatively fast, automatic failover Can offload read-only activity, but no schema changes are allowed Ø Makes it harder to use as a replacement for replication for reporting purposes Always. On Availability Groups 22 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Availability Groups – How logs shipped? 23 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Availability Groups 24 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø SQL Server failover cluster implemented on a Windows Server failover cluster Ø Ø Ø Provides instance-level high availability Ø Ø Multiple nodes, one or more instances Requires shared storage, which is a single point of failure You can use SMB 3. 0 file shares for SQL Server storage instead of a SAN tempdb can be located on each node with SQL Server 2012 or newer All databases, logins, Agent jobs are included Failover time is longer than most other technologies Ø Ø Depends on how long crash recovery takes for each database Keep your VLF counts under control Windows Failover Clustering 25 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Windows Failover Clustering 26 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Feature Name Enterprise Edition Standard Edition Express Edition Partial database availability Yes No No Backup Compression Yes No Database Snapshot Yes No No Online Index Operation Yes No No Log Shipping Yes No Replication Yes Subscriber Only Database Mirroring Yes, Synch Only Witness Only Failover Clustering Yes No Always. On Availability Groups Yes No, Until 2016 No High Availability Features By Edition 27 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø Ø Ø Integral part of Highly-Available system design Total loss can be recovered from backups only Priority list – Ø Ø Ø What needs to be brought online Data Loss SLA (RPO) Downtime SLA (RTO) Consider and outline all questions that can arose during a disaster (What If? What? Where? When? How? ) HA/DR Test Planning and Testing Planning a DR Strategy 28 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Ø HA/DR is much more than just using a technology or feature Ø Ø Understand your RPO and RTO SLA requirements Understand your budget and infrastructure limitations Make sure you have a good backup/restore strategy, regardless of your HA/DR choices Keep in mind that you can combine HA/DR features to have a more robust solution Summary 29 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org

Q&A Thank You! 30 Muhammad Tariq - Presented at PASS Lahore General Conference 2017 – Sponsored by Confiz Limited. https: //lahore. pass. org