Lecture 5 Cluster Middleware and Single System Image

  • Slides: 15
Download presentation
Lecture 5 Cluster Middleware and Single System Image • Resource Management and Scheduling (RMS)

Lecture 5 Cluster Middleware and Single System Image • Resource Management and Scheduling (RMS) •

Cluster Middleware and Single System Image (SSI) is the collection of interconnected nodes that

Cluster Middleware and Single System Image (SSI) is the collection of interconnected nodes that appear as a unified resource. It creates an illusion of resources such as hardware or software that presents a single powerful resource. It is supported by a middleware layer that resides between the OS and the userlevel environment. The middleware consists of two sub-layers, namely SSI Infrastructure and System Availability Infrastructure (SAI). SAI enables cluster services such as check pointing, automatic failover, recovery from failure and fault-tolerant.

Contd… • SSI Levels or Layers ØHardware (Digital (DEC) Memory Channel, Hardware DSM and

Contd… • SSI Levels or Layers ØHardware (Digital (DEC) Memory Channel, Hardware DSM and SMP Techniques) ØOperating System Kernel – Gluing Layer (Solaris MC and GLUnix) ØApplications and Subsystems – Middleware v. Applications v. Runtime Systems v. Resource Management and Scheduling Software (LSF and CODINE)

Contd…. • SSI Boundaries ØEvery SSI has a boundary. ØSSI can exist at different

Contd…. • SSI Boundaries ØEvery SSI has a boundary. ØSSI can exist at different levels within a system – one able to be built on another

Contd… • SSI Benefits ØIt provides a view of all system resources and activities

Contd… • SSI Benefits ØIt provides a view of all system resources and activities from any node of the cluster. ØIt frees the end user to know where the application will run. ØIt frees the operator to know where a resource is located. ØIt allows the administrator to manage the entire cluster as a single entity. ØIt allows both centralize or decentralize system management ØIt simplifies system management. ØIt provides location-independent message communication. ØIt tracks the locations of all resources

Contd… • Middleware Design Goals ØTransparency ØScalable Performance ØEnhanced Availability

Contd… • Middleware Design Goals ØTransparency ØScalable Performance ØEnhanced Availability

Contd… Key Service of SSI and Availability Infrastructure • SSI Support Services ØSingle Point

Contd… Key Service of SSI and Availability Infrastructure • SSI Support Services ØSingle Point of Entry ØSingle File Hierarchy ØSingle Point of Management and Control ØSingle Virtual Networking ØSingle Memory Space ØSingle Job Management System ØSingle User Interface

Contd… • Availability Support Functions Ø Single I/O Space Ø Single Process Space Ø

Contd… • Availability Support Functions Ø Single I/O Space Ø Single Process Space Ø Check pointing and Process Migration

Resource Management and Scheduling (RMS) It is the act of distributing applications among computers

Resource Management and Scheduling (RMS) It is the act of distributing applications among computers to maximize their throughput. The software that performs RMS consists of two components. ØResource manager: It is concerned with problems, such as locating and allocating computational resources, as well as process creation and migration. ØResource scheduler: It is concerned with tasks such as queuing applications, as well as resource location and assignment.

Contd… RMS is necessary in many aspects such as, ØLoad balancing ØUtilizing spare CPU

Contd… RMS is necessary in many aspects such as, ØLoad balancing ØUtilizing spare CPU cycles ØProviding fault tolerant systems ØManaged access to powerful systems

Services provided by RMS environment • Process migration- It is used when a process

Services provided by RMS environment • Process migration- It is used when a process can be suspended, moved and restarted on another computer within the RMS environment • Check pointing-It is like snapshot of an executing program’s state is saved.

Contd… • Scavenging Idle Cycles ØRMS system can be set up to utilize idle

Contd… • Scavenging Idle Cycles ØRMS system can be set up to utilize idle CPU cycles. ØIt is observed that between 70% and 90% most of the time workstations are idle. • Fault tolerance ØFault tolerant support can mean that a failed job can be restarted or rerun. ØThus, it guaranteeing that the job will be completed.

Contd… • Minimization of Impact on Users ØIt can be done by either reducing

Contd… • Minimization of Impact on Users ØIt can be done by either reducing a job’s local scheduling priority or suspending the job. ØSuspended jobs can be restarted later or migrated to other resources in the systems. • Load balancing ØJob distribution will allow for the efficient and effective usage of all the resources. ØProcess migration can also be part of the load balancing strategy. ØIt may be beneficial to move processes from overloaded system to lightly loaded ones.

Contd… • Multiple applications queues Job queues can be set up to help and

Contd… • Multiple applications queues Job queues can be set up to help and manage the resources at a particular organization. Each queue can be configured with certain attributes. For example, certain users have priority of short jobs run before long jobs. Job queues can also be set up to manage the usage of specialized resources.