Resource Management and Balancing ECI July 2005 RMS
Resource Management and Balancing ECI, July 2005
RMS – Overview n n n 2 Resource management Job management Monitoring Resource balancing Information dissemination
Job Management n The need n n n Operating system offers job and resource management service for a single computer The batch job control on multi-user mainframes was performed outside the operating system Main advantages are: n n n 3 Structured resource utilization planning and control Abstraction, easy-to-{understand, use} for user Provide a vendor independent user interface
Manager vs. Scheduler n Resource manager n n Resource scheduler n n 4 Locating and allocate resources Authentication Process creation and migration Queuing applications Drive manager (enforce policy)
Job Management - Requirements n A typical job management system offers n n n n 5 Heterogeneous support Batch support Parallel support Interactive support Checkpointing and process migration Load balancing Job run-time limits GUI
RMS Architecture n Prerequisites n n n In practice n n 6 Multi-user & multitasking capabilities Homogeneous OS are not a restriction “Similar” operating systems run on all machines UNIX (in all variants) is very customary in the context of using RMS
Resource Description n Requirements n n n RDL: Language to specify resources n n n 7 Easy to generate simple description Powerful to generate complex description Portable representation Attributed components Administrator: describe what’s available User: describe what’s required Hierarchical
RDL Example n A 1024 nodes transputer with unix front-end DECLARATION BEGIN PROC Transputer DYNAMIC; EXCLUSIVE; DECLARATION BEGIN PROC Backend DECLARATION { PROC; CPU=T 8; MEMORY=4; SPEED=30; REPEAT=1024; } { PORT; REPEAT=4; } END PROC 8 BEGIN PROC Frontend DECLARATION { PROC; OS=Unix; Repeat = 4; } END PROC CONNECTION FOR I = 0 to 3 DO Backend LINK i Frontend LINK i OD END PROC
RMS Components n User interface n n n At the minimum - command line user interface GUI becoming indispensable Typical commands n n n 9 Job submission to register for execution status display to monitor progress or failure of a job Job deletion to cancel jobs no longer needed
RMS Components (contd) n Administrative environment n n n n 10 Specify nodes characteristics Define feasible job classes and map to hosts Define user access permissions Specify resource limitations for users and jobs Specify policies for the assignment of jobs Control and ensure properation of the RMS Analyze accounting data to tune the system
RMS Entities n Queues n n Hosts n n 11 Compute hosts, control hosts Users n n Queues bound to hosts, jobs assigned to queues Capabilities, permissions, priorities Jobs Resources Policies
RMS Entities – Jobs n Job: collection of computational tasks n n In the context of RMS n n 12 A single program, or several interacting programs Batch Jobs: require no manual interaction as soon as started Interactive Jobs: require input during runtime Parallel Jobs: subtasks spread across several hosts in a cluster Check-pointing Jobs: periodically save status to the file system and can be aborted anytime
RMS Entities – Jobs n Batch jobs n n n Interactive jobs n n n Need to maintain a terminal connection “Watchdog” monitor withdraw from pool Parallel jobs n n 13 Dispatch jobs according to policy and availability Suspend/Resume & checkpoint/restart Need to integrate with parallel environment Scheduling policy is more complex
RMS Entities - Resources n n n Available memory, CPU time, network bandwidth, and peripheral devices, licenses Jobs declare resource requirements RMS enforces resource consumption n 14 ensures quality of service prevents over-subscription detects over-usage
RMS Entities - Policies n Abstract mechanisms to automate control n n n Resource Utilization Policies n n n imbalanced load is common in clusters important/urgent work starved unauthorized users may take advantage users may exceed desired resource usage over time Monitor resource consumption Dispatch of new jobs Scheduling Policies n n 15 Dispatch of new jobs Relocation of jobs
Resource Utilization Policies n Share based n n n Functional n Assignment by functional importance (priority) Past usage is not taken into account n Time-critical applications n Administrators like power… n n n 16 Resource “credit” is assigned to users, depts… Hierarchical share tree defines sharing Establish entitlements within time frame Fair distribution of resources Deadline Manual override
Scheduling Policies n Dispatch time – who, where n n n Relocation – who, when, where n 17 First-Come-First-Served Select-Least-Loaded Select-Fixed-Sequence Combinations above Dynamic resource balancing
Scheduling of Parallel Processes n Gang scheduling n n Requires tight-coupling (MPP’s) Co-scheduling n Demand-based n n n Implicit n 18 False priority Concurrent applications Busy wait to not relinquish cpu
RMS Challenges n Open Interfaces n n n 19 Export load balancing/distributed capabilities Export status info (load, job status, queues) Control/assistance from application Integration with other environments (MPI) Extend functionality for special cases API must be: simple, usable, abstract, robust
RMS Example: CODINE n CODINE/GRD n n Continuously match utilization with policies n n 20 cod_qmaster: master daemon cod_schedd: scheduler daemon cod_execd: execution daemon GRD monitors and adjusts resource usage correlated to all processes of a job Feedback to adjust shares towards changing requirements
Static Scheduling Scheme 21
Dynamic Scheduling Scheme 22
RMS Example: PBS n Portable Batch Sysetm n n n 23 Scheduler – job to node mapping, queues Server – communications, logs Control daemon (per node) – executive agent Scope – single node Job arrays Task Management interface
RMS Example: Condor n n Condor: a distributed job scheduler Harvest idle workstations Job scheduling and migration Advertising mechanism n n n 24 Both job and W/S advertise presence Jobs advertise requirements (job description file) W/S advertise their capabilities
Condor: Example JDF universe = vanilla # select runtime environment executable = some_job requirements = (Arch=="INTEL" && Op. Sys=="LINUX") rank = (Memory * 10000) + KFlops #target arguments = -verbose input = in. dat # redirect to stdin output = out. dat # redirect to stdout log = log. txt Queue # add job to queue 25
RMS: Condor (contd) n Universe n n n Process migration n 26 Vanilla: sequential apps (shared FS) MPI, PVM: integrated with parallel environment Globus: grid computing environment Standard: enables process migration Reschedule higher priority job User reclaims her W/S Must be linked with a special library
RMS: Condor (contd) n Access to data n n Shared file system Condor file transfer mechanism n n n Remote I/O calls (in standard universe) Architecture n n 27 Automatically prefetch, postfetch Central manager Server on each node
Known Condor Pools 28
Monitoring n Design choices n n n 29 Centralized decentralized Periodic request driven Flat hierarchical Resolution of information Focused view
Monitoring Example: Parmon n Features: n n n 30 Online creation of Node and Group database Component, Node, Group, or entire Cluster level Monitoring of CPU, memory, disk and network, processes, log files etc Facility to define events & automatic notification Misc: message broadcast, remote admin, GUI
Load Balancing n n n 31 Application system Static dynamic adaptive Centralized decentralized Receiver initiated sender initiated Parallel applications On-line nature
LB: Application Level n Application level n n n Hard to estimate execution times n n n 32 Round robin Randomized Recursive bisection Other optimization Indeterminate no. of steps Unpredictable load Communication delays
LB: System Level n System level n n n Estimate run time n n 33 Round robin Randomized Specified by job description Estimate from past experience
LB Example: MOSIX n n n n 34 Decentralized Symmetric Deterministic Responsive Stable Competitive Resources: CPU, memory, I/O
Load Balancing Over Network n Distribute workload or network traffic load across the cluster n n n Processing nodes provide status information n n 35 Nodes may be interconnected among themselves Must be connected to the balancing device current processor load the application system load number of active users the availability of network protocol buffers other specific resources
Load Balancing Over Network n Balancing device n n n 36 monitors the status of all nodes dictates where to direct the next job a single unit or a group in tree hierarchy use one or more algorithms or methods static or dynamic setting Decide which node gets the next incoming connection request
Factors in Network Balancing n n Wire-speed processing Node operating system limitation n n Balancing device limitations n n n 37 Packet processing, no. of connections, interrupts Tables, memory Session based traffic, non-session UDP Application dependencies (affinity)
Simple Balancing Methods n Weighting n n Randomization n n Works good in identical node environment Round-Robin n Assign weights to the nodes of different capacities Commonly used by itself in DNS (address caching) Effective where all the nodes in the cluster are identical in capacity and performance Hashing n 38 Packets from the same source address will always get assigned to the same server
Simple Balancing Methods n Least Connections n n Minimum Misses n n Assigns to the node which currently has the least connections ( ≠ least load ) Assign to the nodes which has processed the least number of incoming request in its history Fastest Response n n Assigns to the node with the fastest response Requires active monitoring of the individual nodes n n 39 Sending ICMP packets with the ‘ping’ command Proprietary mechanism based upon UDP packets
Advanced Balancing Methods n Primary optimization vectors n n n n Node traffic – predict volume of traffic Network traffic – monitor node state Node-load based balancing – (which load ? ) DNS load balancing - simple Topology-based – reduce latency Application-specific performance Policy based optimization n 40 Application , bandwidth, admin, security
Common Errors n There are four common errors n n n 41 Overflow Underflow Routing errors Induced network errors May destabilize efficient network clustering
Information Dissemination n n Central Decentralized Load incurred on system n n n Partial knowledge – gossip algorithms n 42 Processing load Network load Example: finding average load
- Slides: 42