Single System Image Infrastructure and Tools 1 Cluster

  • Slides: 27
Download presentation
Single System Image Infrastructure and Tools 1

Single System Image Infrastructure and Tools 1

Cluster Computer Architecture Parallel Applications Sequential Applications Parallel Programming Environment Cluster Middleware (Single System

Cluster Computer Architecture Parallel Applications Sequential Applications Parallel Programming Environment Cluster Middleware (Single System Image and Availability Infrastructure) PC/Workstation Communications Software Network Interface Hardware Cluster Interconnection Network/Switch 2

A major issues in Cluster design • Enhanced Performance (performance @ low cost) •

A major issues in Cluster design • Enhanced Performance (performance @ low cost) • Enhanced Availability (failure management) • Single System Image (look-and-feel of one system) • Size Scalability (physical & application) • Fast Communication (networks & protocols) • Load Balancing (CPU, Net, Memory, Disk) • Security and Encryption (clusters of clusters) • Distributed Environment (Social issues) • Manageability (admin. And control) • Programmability (simple API if required) • Applicability (cluster-aware and non-aware app. ) 3

A typical Cluster Computing Environment Applications PVM / MPI/ RSH ? ? ? Hardware/OS

A typical Cluster Computing Environment Applications PVM / MPI/ RSH ? ? ? Hardware/OS 4

The missing link is provide by cluster middleware/underware Applications PVM//MPI/RSH Middleware or Underware Hardware/OS

The missing link is provide by cluster middleware/underware Applications PVM//MPI/RSH Middleware or Underware Hardware/OS 5

Middleware Design Goals n Complete Transparency (Manageability): n Lets the see a single cluster

Middleware Design Goals n Complete Transparency (Manageability): n Lets the see a single cluster system. . n n Scalable Performance: n Easy growth of cluster n n Single entry point, ftp, telnet, software loading. . . no change of API & automatic load distribution. Enhanced Availability: n Automatic Recovery from failures n n Employ checkpointing & fault tolerant technologies Handle consistency of data when replicated. . 6

What is Single System Image (SSI)? SSI is the illusion, created by software or

What is Single System Image (SSI)? SSI is the illusion, created by software or hardware, that presents a collection of computing resources as one, more whole resource. n SSI makes the cluster appear like a single machine to the user, to applications, and to the network. n 7

Benefits of SSI n n n n Use of system resources transparent. Transparent process

Benefits of SSI n n n n Use of system resources transparent. Transparent process migration and load balancing across nodes. Improved reliability and higher availability. Improved system response time and performance Simplified system management. Reduction in the risk of operator errors. No need to be aware of the underlying system architecture to use these machines effectively. 8

Desired SSI Services n Single Entry Point: n n n n telnet cluster. my_institute.

Desired SSI Services n Single Entry Point: n n n n telnet cluster. my_institute. edu telnet node 1. cluster. institute. edu Single File Hierarchy: /Proc, NFS, x. FS, AFS, etc. Single Control Point: Management GUI Single virtual networking Single memory space - Network RAM/DSM Single Job Management: Glunix, Codine, LSF Single GUI: Like workstation/PC windowing environment – it may be Web technology 9

Availability Support Functions n Single I/O space: n n Single process Space: n n

Availability Support Functions n Single I/O space: n n Single process Space: n n Any node can access any peripheral or disk devices without the knowledge of physical location. Any process on any node create process with cluster wide process wide and they communicate through signal, pipes, etc, as if they are one a single node. Checkpointing and process migration: n Can saves the process state and intermediate results in memory to disk to support rollback recovery when node fails. RMS Load balancing. . . 10

SSI Levels n SSI levels of abstractions: Application and Subsystem Level Operating System Kernel

SSI Levels n SSI levels of abstractions: Application and Subsystem Level Operating System Kernel Level Hardware Level 11

SSI at Application and Sub-system Levels Level Application Sub-system File system Toolkit Examples batch

SSI at Application and Sub-system Levels Level Application Sub-system File system Toolkit Examples batch system and system management Distributed DB, OSF DME, Lotus Notes, MPI, PVM Sun NFS, OSF, DFS, Net. Ware, and so on OSF DCE, Sun ONC+, Apollo Domain Boundary An application A sub-system Importance What a user wants SSI for all applications of the sub-system Shared portion of the file system Implicitly supports many applications and subsystems Explicit toolkit facilities: user, service name, time Best level of support for heterogeneous system (c) In search of clusters 12

SSI at OS Kernel Level Kernel/ OS Layer Kernel interfaces Virtual memory Microkernel Examples

SSI at OS Kernel Level Kernel/ OS Layer Kernel interfaces Virtual memory Microkernel Examples Boundary Importance Each name space: Kernel support for Solaris MC, Unixware MOSIX, Sprite, Amoeba files, processes, applications, adm pipes, devices, etc. subsystems /GLunix UNIX (Sun) vnode, Locus (IBM) vproc None supporting OS kernel Mach, PARAS, Chorus, OSF/1 AD, Amoeba Type of kernel objects: files, processes, etc. Modularises SSI code within kernel Each distributed virtual memory space May simplify implementation of kernel objects Each service outside the microkernel Implicit SSI for all system services (c) In search of clusters 13

SSI at Hardware Level Examples Boundary Importance Application and Subsystem Level Operating System Kernel

SSI at Hardware Level Examples Boundary Importance Application and Subsystem Level Operating System Kernel Level memory and I/O SCI, DASH SCI, SMP techniques memory space better communication and synchronization memory and I/O device space lower overhead cluster I/O (c) In search of clusters 14

SSI Characteristics Every SSI has a boundary. n Single system support can exist at

SSI Characteristics Every SSI has a boundary. n Single system support can exist at different levels within a system, one able to be build on another. n 15

SSI Boundaries Batch System SSI Boundary (c) In search of clusters 16

SSI Boundaries Batch System SSI Boundary (c) In search of clusters 16

Relationship Among Middleware Modules 17

Relationship Among Middleware Modules 17

SSI via OS path! n 1. Build as a layer on top of the

SSI via OS path! n 1. Build as a layer on top of the existing OS n n n Benefits: makes the system quickly portable, tracks vendor software upgrades, and reduces development time. i. e. new systems can be built quickly by mapping new services onto the functionality provided by the layer beneath. e. g. : Glunix. 2. Build SSI at kernel level, True Cluster OS n n Good, but Can’t leverage of OS improvements by vendor. E. g. Unixware, Solaris-MC, and MOSIX. 18

SSI Systems & Tools n OS level SSI: n n Middleware level SSI: n

SSI Systems & Tools n OS level SSI: n n Middleware level SSI: n n SCO NSC Unix. Ware; Solaris-MC; MOSIX, …. PVM, Tread. Marks (DSM), Glunix, Condor, Codine, Nimrod, …. Application level SSI: n PARMON, Parallel Oracle, . . . 19

SCO Non-stop Cluster for Unix. Ware http: //www. sco. com/products/clustering/ UP or SMP node

SCO Non-stop Cluster for Unix. Ware http: //www. sco. com/products/clustering/ UP or SMP node Users, applications, and systems management Standard OS kernel calls Standard SCO Unix. Ware with clustering hooks Extensions Users, applications, and systems management Extensions Modular kernel extensions Standard OS kernel calls Standard SCO Unix. Ware with clustering hooks Modular kernel extensions Devices Server. Net Other nodes

How does Non. Stop Clusters Work? n Modular Extensions and Hooks to Provide: n

How does Non. Stop Clusters Work? n Modular Extensions and Hooks to Provide: n n n Single Clusterwide Filesystem view; Transparent Clusterwide device access; Transparent swap space sharing; Transparent Clusterwide IPC; High Performance Internode Communications; Transparent Clusterwide Processes, migration, etc. ; Node down cleanup and resource failover; Transparent Clusterwide parallel TCP/IP networking; Application Availability; Clusterwide Membership and Cluster timesync; Cluster System Administration; Load Leveling.

Sun Solaris MC: A High Performance Operating System for Clusters n n A distributed

Sun Solaris MC: A High Performance Operating System for Clusters n n A distributed OS for a multicomputer, a cluster of computing nodes connected by a high-speed interconnect Provide a single system image, making the cluster appear like a single machine to the user, to applications, and the network Built as a globalization layer on top of the existing Solaris kernel Interesting features n n n extends existing Solaris OS preserves the existing Solaris ABI/API compliance provides support for high availability uses C++, IDL, CORBA in the kernel leverages spring technology 22

Solaris-MC: Solaris for Multi. Computers n n n global file system globalized process management

Solaris-MC: Solaris for Multi. Computers n n n global file system globalized process management globalized networking and I/O http: //www. sun. com/research/solaris-mc/ 23

Solaris MC components n n n Object and communicatio n support High availability support

Solaris MC components n n n Object and communicatio n support High availability support PXFS global distributed file system Process management Networking 24

MOSIX: Multicomputer OS for UNIX http: //www. mosix. cs. huji. ac. il/ || mosix.

MOSIX: Multicomputer OS for UNIX http: //www. mosix. cs. huji. ac. il/ || mosix. org n n n An OS module (layer) that provides the applications with the illusion of working on a single system. Remote operations are performed like local operations. Transparent to the application - user interface unchanged. Application PVM / MPI / RSH MO SIX Hardware/OS 25

Main tool Preemptive process migration that can migrate any process, anywhere, anytime n n

Main tool Preemptive process migration that can migrate any process, anywhere, anytime n n n Supervised by distributed algorithms that respond on-line to global resource availability – transparently. Load-balancing - migrate process from overloaded to under-loaded nodes. Memory ushering - migrate processes from a node that has exhausted its memory, to prevent paging/swapping. 26

MOSIX for Linux at HUJI n A scalable cluster configuration: n n n n

MOSIX for Linux at HUJI n A scalable cluster configuration: n n n n 50 Pentium-II 300 MHz 38 Pentium-Pro 200 MHz (some are SMPs) 16 Pentium-II 400 MHz (some are SMPs) Over 12 GB cluster-wide RAM Connected by the Myrinet 2. 56 G. b/s LAN Runs Red-Hat 6. 0, based on Kernel 2. 2. 7 Upgrade: HW with Intel, SW with Linux Download MOSIX: n http: //www. mosix. cs. huji. ac. il/ 27