Clustering Technology For Fault Tolerance Jim Gray Microsoft Research http: //www. research. Microsoft. com/~Gray
What is Wolfpack? 4 A consortium of 60 HW & SW vendors (everybody who is anybody) 4 A set of APIs for clustering and fault tolerance 4 An enhancement to NT™ Server (in beta test ) 4 Key concepts – – – System: a particular node Cluster: a collection of systems working together resource: a hardware or software module resource dependency: one resource needs another resource group: fails over as a unit: dependencies do not cross group boundaries
What Wolfpack Supports in V 1 4 two node failover (twin-tail SCSI) 4 Apps: – File, Print, web server, IP address, Net Name – Most of Microsoft Back. Office (SQL, Exchange, Viper, Falcon, …) – Oracle – SAP – many others 4 Easy to program, operate, use
Cluster Advantages 4 Clients and Servers made from the same stuff. – Inexpensive: Built with commodity components 4 Fault tolerance: – Spare modules mask failures 4 Modular growth – grow by adding small modules 4 Parallel data search – use multiple processors and disks
What Happens When a Component Fails? 4 Redundant disk or path: configure around it. 4 Non-redundant software: restart. 4 Non-redundant hardware: migrate software to surviving nodes. 4 Fault detection: 1 ms to 10 sec. 4 Failover. 1 sec to 1 min. 4 This is standard in Tandem, Teradata, VMScluster