Multiprocessor Scheduling Module 3 1 For a good

  • Slides: 25
Download presentation
Multiprocessor Scheduling Module 3. 1 For a good summary on multiprocessor and real-time scheduling,

Multiprocessor Scheduling Module 3. 1 For a good summary on multiprocessor and real-time scheduling, visit: http: //www. cs. uah. edu/~weisskop/osnotes_html/M 8. html 1

Classifications of Multiprocessor Systems • Loosely coupled multiprocessor, or clusters – Each processor has

Classifications of Multiprocessor Systems • Loosely coupled multiprocessor, or clusters – Each processor has its own memory and I/O channels. • Functionally specialized processors – Such as I/O processor • Nvidia GPGPU – Controlled by a master processor • Tightly coupled multiprocessing – – MCMP Processors share main memory Controlled by operating system More economical than clusters 2

Types of parallelism • • Bit-level parallelism Instruction-level parallelism Data parallelism Task parallelism our

Types of parallelism • • Bit-level parallelism Instruction-level parallelism Data parallelism Task parallelism our focus 3

Synchronization Granuality • Refers to frequency of synchronization or parallelism among processes in the

Synchronization Granuality • Refers to frequency of synchronization or parallelism among processes in the system • Five classes exist – Independent (SI is not applicable) – Very coarse (2000 < SI < 1 M) – Course (200 < SI < 2000) – Medium (20 < SI < 200) – Fine (SI < 20) SI is called the Synchronization Interval, and measured in instructions. 4

Wikipedia on Fine-grained, coarsegrained, and embarrassing parallelism • Applications are often classified according to

Wikipedia on Fine-grained, coarsegrained, and embarrassing parallelism • Applications are often classified according to how often their subtasks need to synchronize or communicate with each other. • An application exhibits fine-grained parallelism if its subtasks must communicate many times per second; • it exhibits coarse-grained parallelism if they do not communicate many times per second, • and it is embarrassingly parallel if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize. 5

Independent Parallelism • Multiple unrelated processes • Separate application or job, e. g. spreadsheet,

Independent Parallelism • Multiple unrelated processes • Separate application or job, e. g. spreadsheet, word processor, etc. • No synchronization • More than one processor is available – Average response time to users is less 6

Coarse and Very Coarse-Grained Parallelism • Very coarse: distributed processing across network nodes to

Coarse and Very Coarse-Grained Parallelism • Very coarse: distributed processing across network nodes to form a single computing environment • Coarse: Synchronization among processes at a very gross level • Good for concurrent processes running on a multiprogrammed uniprocessor – Can by supported on a multiprocessor with little change 7

Medium-Grained Parallelism • Parallel processing or multitasking within a single application • Single application

Medium-Grained Parallelism • Parallel processing or multitasking within a single application • Single application is a collection of threads • Threads usually interact frequently, leading to a medium-level synchronization. 8

Fine-Grained Parallelism • Highly parallel applications • Synchronization every few instructions (on very short

Fine-Grained Parallelism • Highly parallel applications • Synchronization every few instructions (on very short events). • Fill the gap between ILP (instruction level parallelism) and Medium-grained parallelism. • Can be found in small inner loops – Use of MPI and Open. MP programming languages • OS should not intervene. Usually done by HW • In practice, this is very specialized and fragmented area 9

Scheduling • Scheduling on a multiprocessor involves 3 interrelated design issues: – Assignment of

Scheduling • Scheduling on a multiprocessor involves 3 interrelated design issues: – Assignment of processes to processors – Use of multiprogramming on individual processors • Makes sense for processes (coarse-grained) • May not be good for threads (medium-grained) – Actual dispatching of a process • What scheduling policy should we use: FCFS, RR, etc. Sometimes a very sophisticated policy becomes counter productive. 10

Assignment of Processes to Processors • Treat processors as a pooled resource and assign

Assignment of Processes to Processors • Treat processors as a pooled resource and assign process to processors on demand • Permanently assign process to a processor – Dedicate short-term queue for each processor – Less overhead. Each does its own scheduling on its queue. – Disadvantage: Processor could be idle (has an empty queue) while another processor has a backlog. 11

Assignment of Processes to Processors • Global queue – Schedule to any available processor

Assignment of Processes to Processors • Global queue – Schedule to any available processor – During the lifetime of the process, process may run on different processors at different times. – In SMP architecture, context switching can be done with small cost. • Master/slave architecture – – – Key kernel functions always run on a particular processor Master is responsible for scheduling Slave sends service request to the master Synchronization is simplified Disadvantages • Failure of master brings down whole system • Master can become a performance bottleneck 12

Assignment of Processes to Processors • Peer architecture – Operating system can execute on

Assignment of Processes to Processors • Peer architecture – Operating system can execute on any processor – Each processor does self-scheduling from a pool of available processes – Complicates the operating system • Make sure two processors do not choose the same process • Needs lots of synchronization 13

Process Scheduling in Today’s SMP • M/M/M/K Queueing system – Single queue for all

Process Scheduling in Today’s SMP • M/M/M/K Queueing system – Single queue for all processes – Multiple queues are used for priorities – All queues feed to the common pool of processors • Specific scheduling disciplines is less important with more than one processor – A simple FCFS discipline with a static priority may suffice for a multiprocessor system. – Illustrate using graph, p 460 – In conclusion, specific scheduling discipline is much less important with SMP than UP 14

Threads • Executes separate from the rest of the process • An application can

Threads • Executes separate from the rest of the process • An application can be a set of threads that cooperate and execute concurrently in the same address space • Threads running on separate processors yields a dramatic gain in performance 15

Multiprocessor Thread Scheduling -Four General Approaches (1/2) • Load sharing – Processes are not

Multiprocessor Thread Scheduling -Four General Approaches (1/2) • Load sharing – Processes are not assigned to a particular processor • Gang scheduling – A set of related threads is scheduled to run on a set of processors at the same time 16

Multiprocessor Thread Scheduling -Four General Approaches (2/2) • Dedicated processor assignment – Threads are

Multiprocessor Thread Scheduling -Four General Approaches (2/2) • Dedicated processor assignment – Threads are assigned to a specific processor. (each thread can be run on a processor. ) – When program terminates, processors are returned to the processor available pool • Dynamic scheduling – Number of threads can be altered during course of execution 17

Load Sharing • Load is distributed evenly across the processors • No centralized scheduler

Load Sharing • Load is distributed evenly across the processors • No centralized scheduler required – OS runs on every processor to select the next thread. • Use global queues for ready threads – Usually FCFS policy 18

Disadvantages of Load Sharing • Central queue needs mutual exclusion – May be a

Disadvantages of Load Sharing • Central queue needs mutual exclusion – May be a bottleneck when more than one processor looks for work at the same time – A noticeable problem when there are many processors. • Preemptive threads will unlikely resume execution on the same processor – Cache use is less efficient • If all threads are in the global queue, all threads of a program will not gain access to the processors at the same time. – Performance is compromised if coordination among threads is high. 19

Gang Scheduling • Simultaneous scheduling of threads that make up a single process •

Gang Scheduling • Simultaneous scheduling of threads that make up a single process • Useful for applications where performance severely degrades when any part of the application is not running • Threads often need to synchronize with each other 20

Advantages • • Closely-related threads execute in parallel Synchronization blocking is reduced Less context

Advantages • • Closely-related threads execute in parallel Synchronization blocking is reduced Less context switching Scheduling overhead is reduced, as a single sync to signal() may affect many threads 21

Scheduling Groups – two time slicing divisions 22

Scheduling Groups – two time slicing divisions 22

Dedicated Processor Assignment – Affinitization (1/2) • When application is scheduled, each of its

Dedicated Processor Assignment – Affinitization (1/2) • When application is scheduled, each of its threads is assigned a processor that remains dedicated/affinitized to that thread until the application runs to completion • No multiprogramming of processors, i. e. one processor per a specific thread, and no other thread. • Some processors may be idle as threads may block • Eliminates context switches; with certain types of applications this can save enough time to compensate for the possible idle time penalty. • However, in a highly parallel environment with 100 of processors, utilization of processors is not a major issue, but performance is. The total avoidance of switching results in substantial speedup. 23

Dedicated Processor Assignment – Affinitization (2/2) • In Dedicated Assignment, when the number of

Dedicated Processor Assignment – Affinitization (2/2) • In Dedicated Assignment, when the number of threads exceeds the number of available processors, efficiency drops. – Due to thread preemption, context switching, suspending of other threads, and cache pollution • Notice that both gang scheduling and dedicated assignment are more concerned with allocation issues than scheduling issues. • The important question becomes "How many processors should a process be assigned? " rather than "How shall I choose the next process? " 24

Dynamic Scheduling • Threads within a process are variable • Sharing the work between

Dynamic Scheduling • Threads within a process are variable • Sharing the work between OS and application. • When a job originates, it requests a certain number of processors. The OS grants some or all of the request, based on the number of processors currently available. • Then the application itself can decide which threads run when on which processors. This requires language support as would be provided with thread libraries. • When processors become free due to the termination of threads or processes, the OS can allocate them as needed to satisfy pending requests. • Simulations have shown that this approach is superior to gang scheduling and or dedicated scheduling. 25