Resource Management and Balancing ECI July 2005 RMS

Resource Management and Balancing ECI, July 2005

RMS – Overview n n n 2 Resource management Job management Monitoring Resource balancing Information dissemination

Job Management n The need n n n Operating system offers job and resource management service for a single computer The batch job control on multi-user mainframes was performed outside the operating system Main advantages are: n n n 3 Structured resource utilization planning and control Abstraction, easy-to-{understand, use} for user Provide a vendor independent user interface

Manager vs. Scheduler n Resource manager n n Resource scheduler n n 4 Locating and allocate resources Authentication Process creation and migration Queuing applications Drive manager (enforce policy)

Job Management - Requirements n A typical job management system offers n n n n 5 Heterogeneous support Batch support Parallel support Interactive support Checkpointing and process migration Load balancing Job run-time limits GUI

RMS Architecture n Prerequisites n n n In practice n n 6 Multi-user & multitasking capabilities Homogeneous OS are not a restriction “Similar” operating systems run on all machines UNIX (in all variants) is very customary in the context of using RMS

Resource Description n Requirements n n n RDL: Language to specify resources n n n 7 Easy to generate simple description Powerful to generate complex description Portable representation Attributed components Administrator: describe what’s available User: describe what’s required Hierarchical

RDL Example n A 1024 nodes transputer with unix front-end DECLARATION BEGIN PROC Transputer DYNAMIC; EXCLUSIVE; DECLARATION BEGIN PROC Backend DECLARATION { PROC; CPU=T 8; MEMORY=4; SPEED=30; REPEAT=1024; } { PORT; REPEAT=4; } END PROC 8 BEGIN PROC Frontend DECLARATION { PROC; OS=Unix; Repeat = 4; } END PROC CONNECTION FOR I = 0 to 3 DO Backend LINK i Frontend LINK i OD END PROC

RMS Components n User interface n n n At the minimum - command line user interface GUI becoming indispensable Typical commands n n n 9 Job submission to register for execution status display to monitor progress or failure of a job Job deletion to cancel jobs no longer needed

RMS Components (contd) n Administrative environment n n n n 10 Specify nodes characteristics Define feasible job classes and map to hosts Define user access permissions Specify resource limitations for users and jobs Specify policies for the assignment of jobs Control and ensure properation of the RMS Analyze accounting data to tune the system

RMS Entities n Queues n n Hosts n n 11 Compute hosts, control hosts Users n n Queues bound to hosts, jobs assigned to queues Capabilities, permissions, priorities Jobs Resources Policies

RMS Entities – Jobs n Job: collection of computational tasks n n In the context of RMS n n 12 A single program, or several interacting programs Batch Jobs: require no manual interaction as soon as started Interactive Jobs: require input during runtime Parallel Jobs: subtasks spread across several hosts in a cluster Check-pointing Jobs: periodically save status to the file system and can be aborted anytime

RMS Entities – Jobs n Batch jobs n n n Interactive jobs n n n Need to maintain a terminal connection “Watchdog” monitor withdraw from pool Parallel jobs n n 13 Dispatch jobs according to policy and availability Suspend/Resume & checkpoint/restart Need to integrate with parallel environment Scheduling policy is more complex

RMS Entities - Resources n n n Available memory, CPU time, network bandwidth, and peripheral devices, licenses Jobs declare resource requirements RMS enforces resource consumption n 14 ensures quality of service prevents over-subscription detects over-usage

RMS Entities - Policies n Abstract mechanisms to automate control n n n Resource Utilization Policies n n n imbalanced load is common in clusters important/urgent work starved unauthorized users may take advantage users may exceed desired resource usage over time Monitor resource consumption Dispatch of new jobs Scheduling Policies n n 15 Dispatch of new jobs Relocation of jobs

Resource Utilization Policies n Share based n n n Functional n Assignment by functional importance (priority) Past usage is not taken into account n Time-critical applications n Administrators like power… n n n 16 Resource “credit” is assigned to users, depts… Hierarchical share tree defines sharing Establish entitlements within time frame Fair distribution of resources Deadline Manual override

Scheduling Policies n Dispatch time – who, where n n n Relocation – who, when, where n 17 First-Come-First-Served Select-Least-Loaded Select-Fixed-Sequence Combinations above Dynamic resource balancing

Scheduling of Parallel Processes n Gang scheduling n n Requires tight-coupling (MPP’s) Co-scheduling n Demand-based n n n Implicit n 18 False priority Concurrent applications Busy wait to not relinquish cpu

RMS Challenges n Open Interfaces n n n 19 Export load balancing/distributed capabilities Export status info (load, job status, queues) Control/assistance from application Integration with other environments (MPI) Extend functionality for special cases API must be: simple, usable, abstract, robust

RMS Example: CODINE n CODINE/GRD n n Continuously match utilization with policies n n 20 cod_qmaster: master daemon cod_schedd: scheduler daemon cod_execd: execution daemon GRD monitors and adjusts resource usage correlated to all processes of a job Feedback to adjust shares towards changing requirements

Static Scheduling Scheme 21

Dynamic Scheduling Scheme 22

RMS Example: PBS n Portable Batch Sysetm n n n 23 Scheduler – job to node mapping, queues Server – communications, logs Control daemon (per node) – executive agent Scope – single node Job arrays Task Management interface

RMS Example: Condor n n Condor: a distributed job scheduler Harvest idle workstations Job scheduling and migration Advertising mechanism n n n 24 Both job and W/S advertise presence Jobs advertise requirements (job description file) W/S advertise their capabilities

Condor: Example JDF universe = vanilla # select runtime environment executable = some_job requirements = (Arch=="INTEL" && Op. Sys=="LINUX") rank = (Memory * 10000) + KFlops #target arguments = -verbose input = in. dat # redirect to stdin output = out. dat # redirect to stdout log = log. txt Queue # add job to queue 25

RMS: Condor (contd) n Universe n n n Process migration n 26 Vanilla: sequential apps (shared FS) MPI, PVM: integrated with parallel environment Globus: grid computing environment Standard: enables process migration Reschedule higher priority job User reclaims her W/S Must be linked with a special library

RMS: Condor (contd) n Access to data n n Shared file system Condor file transfer mechanism n n n Remote I/O calls (in standard universe) Architecture n n 27 Automatically prefetch, postfetch Central manager Server on each node

Known Condor Pools 28

Monitoring n Design choices n n n 29 Centralized decentralized Periodic request driven Flat hierarchical Resolution of information Focused view

Monitoring Example: Parmon n Features: n n n 30 Online creation of Node and Group database Component, Node, Group, or entire Cluster level Monitoring of CPU, memory, disk and network, processes, log files etc Facility to define events & automatic notification Misc: message broadcast, remote admin, GUI

Load Balancing n n n 31 Application system Static dynamic adaptive Centralized decentralized Receiver initiated sender initiated Parallel applications On-line nature

LB: Application Level n Application level n n n Hard to estimate execution times n n n 32 Round robin Randomized Recursive bisection Other optimization Indeterminate no. of steps Unpredictable load Communication delays

LB: System Level n System level n n n Estimate run time n n 33 Round robin Randomized Specified by job description Estimate from past experience

LB Example: MOSIX n n n n 34 Decentralized Symmetric Deterministic Responsive Stable Competitive Resources: CPU, memory, I/O

Load Balancing Over Network n Distribute workload or network traffic load across the cluster n n n Processing nodes provide status information n n 35 Nodes may be interconnected among themselves Must be connected to the balancing device current processor load the application system load number of active users the availability of network protocol buffers other specific resources

Load Balancing Over Network n Balancing device n n n 36 monitors the status of all nodes dictates where to direct the next job a single unit or a group in tree hierarchy use one or more algorithms or methods static or dynamic setting Decide which node gets the next incoming connection request

Factors in Network Balancing n n Wire-speed processing Node operating system limitation n n Balancing device limitations n n n 37 Packet processing, no. of connections, interrupts Tables, memory Session based traffic, non-session UDP Application dependencies (affinity)

Simple Balancing Methods n Weighting n n Randomization n n Works good in identical node environment Round-Robin n Assign weights to the nodes of different capacities Commonly used by itself in DNS (address caching) Effective where all the nodes in the cluster are identical in capacity and performance Hashing n 38 Packets from the same source address will always get assigned to the same server

Simple Balancing Methods n Least Connections n n Minimum Misses n n Assigns to the node which currently has the least connections ( ≠ least load ) Assign to the nodes which has processed the least number of incoming request in its history Fastest Response n n Assigns to the node with the fastest response Requires active monitoring of the individual nodes n n 39 Sending ICMP packets with the ‘ping’ command Proprietary mechanism based upon UDP packets

Advanced Balancing Methods n Primary optimization vectors n n n n Node traffic – predict volume of traffic Network traffic – monitor node state Node-load based balancing – (which load ? ) DNS load balancing - simple Topology-based – reduce latency Application-specific performance Policy based optimization n 40 Application , bandwidth, admin, security

Common Errors n There are four common errors n n n 41 Overflow Underflow Routing errors Induced network errors May destabilize efficient network clustering

Information Dissemination n n Central Decentralized Load incurred on system n n n Partial knowledge – gossip algorithms n 42 Processing load Network load Example: finding average load