Overview of high availability in Microsoft SQL Server

Overview of high availability in Microsoft SQL Server Szymon Wójcik

Agenda Introduction What is availability? What is and why to choose high availability? Key factors to consider for high availability scenario High availability techniques in Microsoft SQL Server Replication Log shipping Mirroring Failover clustering Discussion

PLSSUG Cracow Partners

Agenda Introduction What is availability? What is and why to choose high availability? Key factors to consider for high availability scenario High availability techniques in Microsoft SQL Server Replication Log shipping Mirroring Failover clustering Discussion

Introduction Szymon Wójcik Experience with MS SQL Server since 2000 (dev/admin) MCITP: DBA SQL Server 2005 Interests: Performance tuning High availability Blog – sqlphobosq. wordpress. com Twitter - @phobosq

Agenda Introduction What is availability? What is and why to choose high availability? Key factors to consider for high availability scenario High availability techniques in Microsoft SQL Server Replication Log shipping Mirroring Failover clustering Discussion

Availability [1/5] One of the concepts defined within ITIL Ability to perform its agreed function when required Determined by: Reliability – how long (MTBF) Maintainability – how quickly restored (MTRS) Serviceability – contract conditions Performance Security Confidentiality Integrity Availability

Availability [2/5]

Availability [3/5] – figures for one week Allowed downtime duration per week [hh: mm: ss format] Availability level 8 x 5 (40 hours/week) 24 x 7 (168 hours/week) 80% 08: 00 33: 36: 00 90% 04: 00 16: 48: 00 95% 02: 00 08: 24: 00 98% 00: 48: 00 03: 21: 36 99% 00: 24: 00 01: 40: 48 99, 9% 00: 02: 24 00: 10: 05 99, 99% 00: 14, 4 00: 01 99, 999% 00: 01, 44 00: 06

Availability [4/5] – figures for one year Allowed downtime duration per year [DD. hh: mm: ss format] Availability level 8 x 5 (40 hours/week) 24 x 7 (168 hours/week) 80% 52. 00: 00 73. 00: 00 90% 26. 00: 00 36. 12: 00 95% 13. 00: 00 18. 06: 00 98% 5. 04: 48: 00 7. 07: 24: 00 99% 2. 14: 24: 00 3. 15: 48: 00 99, 9% 0. 06: 14: 24 0. 08: 45: 36 99, 99% 0. 00: 37: 26 0. 00: 52: 33 99, 999% 0. 00: 03: 45 0. 00: 05: 15

Availability [5/5] – important notes Availability != Uptime (service may be up but unavailable) Scheduled downtime does not have to cause unavailability (up to definition in SLA)

Agenda Introduction What is availability? What is and why to choose high availability? Key factors to consider for high availability scenario High availability techniques in Microsoft SQL Server Replication Log shipping Mirroring Failover clustering Discussion

High availability - definition System design approach and service implementation that ensures a certain level of operational performance (Wikipedia) Masks the effects of hardware or software failure Maintains availability of applications so that perceived downtime is minimized (Microsoft)

High availability != disaster recovery High availability is used for ensuring for meeting Service Level Target for availability Disaster recovery is ensuring operational continuity They can be used complementary – HA can minimize the need of invoking DR, but never replace it

Why to choose high availability For users: Minimizes downtime probability Allows to sustain a failure if properly designed For administrators: Simplifies migration effort Minimizes risk of continuity

Agenda Introduction What is availability? What is and why to choose high availability? Key factors to consider for high availability scenario High availability techniques in Microsoft SQL Server Replication Log shipping Mirroring Failover clustering Discussion

Single point of failure A whole system is as strong as the weakest link

Hardware redundancy Introduce additional hardware to minimize risk of failure

Hardware redundancy Not only whole machines may be multiplicated to become fault tolerant Also components: Power supplies CPUs Hard disks Network interface cards Storage controllers

Standby node A standby node is a machine in a HA system that takes over in case of primary server failure Three types: Cold standby – Unplugged, needs to be prepared before use Warm standby – Ready to use, but requires manual switch Hot standby – Ready to use, takes over automatically Fail over = switching from primary to standby Fail back = return to primary There may be more than one standby in HA scenario!

Load balancing vs failover Load balancing – distributing of workload between several peer servers If one goes down, others take over Workload distributed by load balancer Failover – automatic switch to standby Standby is not active Switch initiated upon loss of heartbeat

Other points High availability requires additional costs – multiple components must be present according to design in order to meet requirements It may become complex to maintain – additional CIs present in environment that need to be kept up-todate Hardware design must be followed by software to fully benefit from HA KISS – Keep It Simple and Stupid

Agenda Introduction What is availability? What is and why to choose high availability? Key factors to consider for high availability scenario High availability techniques in Microsoft SQL Server Replication Log shipping Mirroring Failover clustering Discussion

High availability in Microsoft SQL Server, as a RDBMS, provides means for failover scenario Load balancing is difficult and must be properly designed in order to work High availability in SQL Server does not prevent logical data corruption – periodic DBCC checks are advised

HA methods overview in SQL Server Method What it does Replication Transfers completed transactions to standby nodes Log shipping Performs regular log backups, copy to standby and restore Database mirroring Replays transactions as they are logged Failover cluster Monitors Windows service status and transfers execution Standby type Cold/warm Cold Warm/hot Hot # of standby nodes Any Remark • • • Standby is accessible May allow for updates Conflicts may appear • Database unavailable during restore Standby may be accessible Any • 1 • • Standby unavailable Requires third server to allow hot standby • Requires shared (or replicated) storage Requires identical hardware Failover = downtime Any • •

Replication Three server roles in replication: Publisher Distributor Subscriber Three types: Snapshot Transactional Merge Two subscription methods: Push – Distributor pushes articles to Subscribers Pull – Subscribers downloads from Distributor

Replication topology

Possible application of replication Create a second copy of data to be used in case of emergency (DR) Create a copy of data to offload the server (load balancing) Allow offline users to work with data and upload their changes later (high availability)

Replication agents External programs which are used to implement replication: Snapshot Agent: creates snapshots Log Reader Agent: Reads transaction log Marks transactions for replication Distribution Agent: Dispatches transactions to Subscriber Merge Agent: Downloads remote and uploads local changes Resolves conflicts in merge replication

Snapshot replication Publisher makes a copy of a database which is applied at Subscriber Good for small, static data: Whole snapshot is applied every time – the changes which appear after snapshot will be applied with next snapshot Requires sufficient bandwidth

Transactional replication Starts with a snapshot Transactions are recorded at Publisher and replayed at Subscriber May allow for updatable subcriptions If Subscriber is offline, records are stored at the Distributor

Merge replication Starts with a snapshot Merges changes between Publisher and Subscribers Allows synchronization via HTTPS (since SQL Server 2008) Allows the most autonomous design – e. g. mobile users, multiple branch offices working on the same data

Replication how-to Configure Distributor Configure Publisher: Select replication type Select articles to be published [Optional] Set up article filtering Set up security Configure Subscribers: Connect to Distributor Select subscription method Apply snapshot [Transactional/merge] Synchronize changes

Failover in replication Stop subscription Direct all traffic from Publisher to Subscriber: Change application connection strings Change DNS aliases, if required, or Change IP addresses

Failback in replication After restoring Publisher, restore a copy of database from Subscriber Direct all traffic from Subscriber to Publisher Reestablish the replication

Log shipping Keeps a standby by automating backup, copy and restore process Three server roles in log shipping: Primary Secondary Monitor

How it works? [1/2] Restore a full backup from Primary to Secondary and then: A job runs on Primary which backs up transaction log Second job copies the log backup to Secondary Third job on Secondary restores the log after it’s copied [Optional] Monitor server tracks performance and incidents

How it works? [2/2]

Failover in log shipping Copy transaction log backups from primary to secondary Backup tail of the log on primary Restore all backups except tail-log with NORECOVERY Restore tail-log with RECOVERY Disable log shipping jobs Redirect client traffic to secondary

Drawbacks of log shipping You can’t miss a transaction log backup Network traffic generated has to be considered You are always behind on Secondary is read-only

Database mirroring Allows to keep your standby up-to-date Allows automatic failover Cost-effective alternative to clustering Available in Standard Edition (2005 – 2008 R 2) Does not require cluster capable hardware Might be in implemented when Windows Authentication mode is not possible (using certificates)

How it works?

Database mirroring modes High availability (with witness) Automatic failover Synchronous transaction commit (principal commits after mirror confirms it’s commit) High protection (without witness) Manual failover Synchronous transaction commit High performance (without witness) Manual failover Asynchronous transaction commit

Manual failover in database mirroring Can be done with one mouse click in SSMS Requires client traffic redirection: Possible within connection string using Failover Partner command

Automatic failover in database mirroring Initiated automatically by witness if there is no quorum: If principal is unavailable, fails over to mirror Does nothing if mirror becomes unavailable Fails over also if principal is up but unreachable from network! Requires client traffic redirection: Possible within connection string using Failover Partner command

Failover clustering Provides protection on a server level: Automatic failover in case of server failure Fails over logins, endpoints and jobs Combines multiple machines (nodes) in a single virtual server Requires cluster-capable hardware: Shared or common storage Certified server hardware

Clustering

Failover in a cluster

Summary Method What it does Replication Transfers completed transactions to standby nodes Log shipping Performs regular log backups, copy to standby and restore Database mirroring Replays transactions as they are logged Failover cluster Monitors Windows service status and transfers execution Standby type Cold/warm Cold Warm/hot Hot # of standby nodes Any Remark • • • Standby is accessible May allow for updates Conflicts may appear • Database unavailable during restore Standby may be accessible Any • 1 • • Standby unavailable Requires third server to allow hot standby • Requires shared (or replicated) storage Requires identical hardware Failover = downtime Any • •

Discussion

THANK YOU!