Outline Distributed scheduling Motivations Design issues Distributed scheduling
Outline • Distributed scheduling – Motivations – Design issues – Distributed scheduling algorithms 12/22/2021 COP 5611 - Operating Systems 1
Motivations • In a locally distributed system, there is a good possibility that several computers are heavily loaded while others are idle or lightly loaded – If we can move jobs around (in other words, distribute the load more evenly), the overall performance of the system can be maximized 12/22/2021 COP 5611 - Operating Systems 2
Motivations – cont. 12/22/2021 COP 5611 - Operating Systems 3
Motivations – cont. 12/22/2021 COP 5611 - Operating Systems 4
Distributed Scheduling • A distributed scheduler is a resource management component of a distributed operating system that focuses on judiciously and transparently redistributing the load of the system among the computers to maximize the overall performance 12/22/2021 COP 5611 - Operating Systems 5
Issues in Load Distribution • Load estimation – Queue lengths – CPU utilization • Load distributing algorithms – Static – Dynamic – Adaptive 12/22/2021 COP 5611 - Operating Systems 6
Issues in Load Distribution – cont. • Load balancing vs. load sharing – Load sharing tries to reduce the likelihood of an unshared state (where one computer is idle while at the same time others are overloaded) by transferring tasks – Load balancing algorithms attempt to equalize loads at all computers 12/22/2021 COP 5611 - Operating Systems 7
Issues in Load Distribution – cont. • Preemptive vs. non-preemptive transfers – Preemptive task transfers involve the transfer of tasks that are partially executed • This transfer is in general expensive as it needs to transfer the entire task state consisting of a virtual memory image, a process control block, unread I/O buffers and messages, file pointers, times that have been set, and so on – Non-preemptive transfers involve the transfer of tasks that have not started yet – Environment transfer 12/22/2021 COP 5611 - Operating Systems 8
Components of a Load Distributing Algorithm • Four components – Transfer policy • Determines when a node needs to send tasks to other nodes or can receive tasks from other nodes – Selection policy • Determines which task(s) to transfer – Location policy • Find suitable nodes for load sharing 12/22/2021 COP 5611 - Operating Systems 9
Components of a Load Distributing Algorithm – cont. • Four components – continued – Information policy • Demand-driven • Periodic • State-change driven 12/22/2021 COP 5611 - Operating Systems 10
Stability • The queuing-theoretic perspective – The CPU queues grow without bound if arrival rate is greater than the rate at which the system can perform work – A load distributing algorithm is effective under a given set of conditions if it improves the performance relative to that of a system not using load distribution • Algorithmic stability – An algorithm is unstable if it can perform fruitless actions indefinitely with finite probability • Processor thrashing 12/22/2021 COP 5611 - Operating Systems 11
Sender-Initiated Algorithms • In sender-initiated algorithms, an overloaded node initiates the load distribution – Transfer policy – Selection policy – Location policy • Random • Threshold • Shortest – Information policy 12/22/2021 COP 5611 - Operating Systems 12
Sender-Initiated Algorithms – cont. 12/22/2021 COP 5611 - Operating Systems 13
Sender-Initiated Algorithms – cont. • Performance analysis – Instability at high system loads • When system loads are high, the sender-initiated algorithms can cause the systems to be unstable – At high system loads, no node is likely to be lightly loaded and the probability that a sender will find a receiver is very low – However, the polling activity increases as the rate at which work arrives increases – Performance at low system loads 12/22/2021 COP 5611 - Operating Systems 14
Receiver-Initiated Algorithms • In receiver-initiated algorithms, an under loaded node initiates the load distribution – Transfer policy – Selection policy – Location policy – Information policy 12/22/2021 COP 5611 - Operating Systems 15
Receiver-Initiated Algorithms – cont. 12/22/2021 COP 5611 - Operating Systems 16
Receiver-Initiated Algorithms – cont. • Performance analysis – At high system loads, the probability of finding a sender is high and thus a sender can find a receiver in a few polls in general – At low system loads, there are few senders but more receiver-initiated polls; these polls do not cause system instability as spare CPU cycles are available • A drawback – Most transfers will be preemptive and thus expensive 12/22/2021 COP 5611 - Operating Systems 17
Empirical Comparison of Sender-Initiated and Receiver-Initiated Algorithms 12/22/2021 COP 5611 - Operating Systems 18
Symmetrically Initiated Algorithms • In symmetrically initiated algorithms, both senders and receivers search for receivers and senders respectively for task transfers – The above average algorithm • Transfer policy • Location policy – Sender-initiated component – Receiver-initiated component • Selection policy • Information policy 12/22/2021 COP 5611 - Operating Systems 19
Symmetrically Initiated Algorithms – cont. • Sender-initiated component – A sender broadcasts a Too. High message, sets a Too. High timeout alarm, and listens for an Accept – A receiver that receives a Too. High message cancels its Too. Low timeout, sends an Accept message to the sender, and increases its load value – On receiving an Accept message, if the site is still a sender, choose the best task to transfer and transfer it – If no Accept has been received before the timeout, it broadcasts a Change. Average message to increase the average load estimates at the other nodes 12/22/2021 COP 5611 - Operating Systems 20
Symmetrically Initiated Algorithms – cont. • Receiver-initiated component – It broadcasts a Too. Low message, set a Too. Low timeout alarm, and starts listening for a Too. High message – If Too. High message is received, it cancels its Too. Low timeout, sends an Accept message to the sender, and increases its load value – If no Too. High message is received before the timeout, the receiver broadcasts a Change. Average message to decrease the average at other nodes 12/22/2021 COP 5611 - Operating Systems 21
Symmetrically Initiated Algorithms – cont. • Performance analysis – Instability at high system loads • Due to the sender-initiated components 12/22/2021 COP 5611 - Operating Systems 22
Comparison 12/22/2021 COP 5611 - Operating Systems 23
Adaptive Algorithms • A stable symmetrically initiated algorithm – Each node keeps of a senders list, a receivers list, and an OK list • By classifying the nodes in the system as Sender/overloaded, Receiver/underloaded, or OK using the information gathered through polling 12/22/2021 COP 5611 - Operating Systems 24
A Stable Symmetrically Initiated Algorithm – cont. • Sender-initiated component – The sender polls the node at the head of the receiver – The polled node moves the sender to the head of its sender list and sends a message indicating it is a receiver, sender, or OK node – The sender updates the polled node based on the reply – If the polled node is a receiver, it transfers a task – The polling process stops if its receiver’s list becomes empty, or the number of polls reaches a Poll. Limit 12/22/2021 COP 5611 - Operating Systems 25
A Stable Symmetrically Initiated Algorithm – cont. • Receiver-initiated component – The nodes polled in the following order • Head to tail of its senders list • Tail to head in the OK list • Tail to head in the receivers list 12/22/2021 COP 5611 - Operating Systems 26
A Stable Sender-Initiated Algorithm • This algorithm uses the sender-initiated algorithm of the stable symmetrically initiated algorithm – Each node is augmented by an array called the statevector • It keeps track of its status at all the other nodes in the system • It is updated based on the information at the polling stage – The receiver-initiated component is replaced by the following protocol • When a node becomes a receiver, it informs all the nodes that are misinformed 12/22/2021 COP 5611 - Operating Systems 27
Comparison 12/22/2021 COP 5611 - Operating Systems 28
Performance Under Heterogeneous Workloads 12/22/2021 COP 5611 - Operating Systems 29
Selecting a Suitable Load Sharing Algorithm • The best algorithm depends on the system under consideration – For example, if the system never attains high loads, sender-initiated algorithms will give an improved algortihm – Stable scheduling algorithms should be used for systems that can reach high loads – For systems with heterogeneous work loads, adaptive stable algorithms are preferable 12/22/2021 COP 5611 - Operating Systems 30
Other Requirements of Load Distributing • Scalability – The algorithm should work well in large distributed systems • • Location transparency Determinism Preemption Heterogeneity 12/22/2021 COP 5611 - Operating Systems 31
Case Studies • • The V-System The Sprite system Condor system The Stealth distributed scheduler 12/22/2021 COP 5611 - Operating Systems 32
- Slides: 32