Multiprocessor Scheduling Module 3 1 For a good
- Slides: 25
Multiprocessor Scheduling Module 3. 1 For a good summary on multiprocessor and real-time scheduling, visit: http: //www. cs. uah. edu/~weisskop/osnotes_html/M 8. html 1
Classifications of Multiprocessor Systems • Loosely coupled multiprocessor, or clusters – Each processor has its own memory and I/O channels. • Functionally specialized processors – Such as I/O processor • Nvidia GPGPU – Controlled by a master processor • Tightly coupled multiprocessing – – MCMP Processors share main memory Controlled by operating system More economical than clusters 2
Types of parallelism • • Bit-level parallelism Instruction-level parallelism Data parallelism Task parallelism our focus 3
Synchronization Granuality • Refers to frequency of synchronization or parallelism among processes in the system • Five classes exist – Independent (SI is not applicable) – Very coarse (2000 < SI < 1 M) – Course (200 < SI < 2000) – Medium (20 < SI < 200) – Fine (SI < 20) SI is called the Synchronization Interval, and measured in instructions. 4
Wikipedia on Fine-grained, coarsegrained, and embarrassing parallelism • Applications are often classified according to how often their subtasks need to synchronize or communicate with each other. • An application exhibits fine-grained parallelism if its subtasks must communicate many times per second; • it exhibits coarse-grained parallelism if they do not communicate many times per second, • and it is embarrassingly parallel if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize. 5
Independent Parallelism • Multiple unrelated processes • Separate application or job, e. g. spreadsheet, word processor, etc. • No synchronization • More than one processor is available – Average response time to users is less 6
Coarse and Very Coarse-Grained Parallelism • Very coarse: distributed processing across network nodes to form a single computing environment • Coarse: Synchronization among processes at a very gross level • Good for concurrent processes running on a multiprogrammed uniprocessor – Can by supported on a multiprocessor with little change 7
Medium-Grained Parallelism • Parallel processing or multitasking within a single application • Single application is a collection of threads • Threads usually interact frequently, leading to a medium-level synchronization. 8
Fine-Grained Parallelism • Highly parallel applications • Synchronization every few instructions (on very short events). • Fill the gap between ILP (instruction level parallelism) and Medium-grained parallelism. • Can be found in small inner loops – Use of MPI and Open. MP programming languages • OS should not intervene. Usually done by HW • In practice, this is very specialized and fragmented area 9
Scheduling • Scheduling on a multiprocessor involves 3 interrelated design issues: – Assignment of processes to processors – Use of multiprogramming on individual processors • Makes sense for processes (coarse-grained) • May not be good for threads (medium-grained) – Actual dispatching of a process • What scheduling policy should we use: FCFS, RR, etc. Sometimes a very sophisticated policy becomes counter productive. 10
Assignment of Processes to Processors • Treat processors as a pooled resource and assign process to processors on demand • Permanently assign process to a processor – Dedicate short-term queue for each processor – Less overhead. Each does its own scheduling on its queue. – Disadvantage: Processor could be idle (has an empty queue) while another processor has a backlog. 11
Assignment of Processes to Processors • Global queue – Schedule to any available processor – During the lifetime of the process, process may run on different processors at different times. – In SMP architecture, context switching can be done with small cost. • Master/slave architecture – – – Key kernel functions always run on a particular processor Master is responsible for scheduling Slave sends service request to the master Synchronization is simplified Disadvantages • Failure of master brings down whole system • Master can become a performance bottleneck 12
Assignment of Processes to Processors • Peer architecture – Operating system can execute on any processor – Each processor does self-scheduling from a pool of available processes – Complicates the operating system • Make sure two processors do not choose the same process • Needs lots of synchronization 13
Process Scheduling in Today’s SMP • M/M/M/K Queueing system – Single queue for all processes – Multiple queues are used for priorities – All queues feed to the common pool of processors • Specific scheduling disciplines is less important with more than one processor – A simple FCFS discipline with a static priority may suffice for a multiprocessor system. – Illustrate using graph, p 460 – In conclusion, specific scheduling discipline is much less important with SMP than UP 14
Threads • Executes separate from the rest of the process • An application can be a set of threads that cooperate and execute concurrently in the same address space • Threads running on separate processors yields a dramatic gain in performance 15
Multiprocessor Thread Scheduling -Four General Approaches (1/2) • Load sharing – Processes are not assigned to a particular processor • Gang scheduling – A set of related threads is scheduled to run on a set of processors at the same time 16
Multiprocessor Thread Scheduling -Four General Approaches (2/2) • Dedicated processor assignment – Threads are assigned to a specific processor. (each thread can be run on a processor. ) – When program terminates, processors are returned to the processor available pool • Dynamic scheduling – Number of threads can be altered during course of execution 17
Load Sharing • Load is distributed evenly across the processors • No centralized scheduler required – OS runs on every processor to select the next thread. • Use global queues for ready threads – Usually FCFS policy 18
Disadvantages of Load Sharing • Central queue needs mutual exclusion – May be a bottleneck when more than one processor looks for work at the same time – A noticeable problem when there are many processors. • Preemptive threads will unlikely resume execution on the same processor – Cache use is less efficient • If all threads are in the global queue, all threads of a program will not gain access to the processors at the same time. – Performance is compromised if coordination among threads is high. 19
Gang Scheduling • Simultaneous scheduling of threads that make up a single process • Useful for applications where performance severely degrades when any part of the application is not running • Threads often need to synchronize with each other 20
Advantages • • Closely-related threads execute in parallel Synchronization blocking is reduced Less context switching Scheduling overhead is reduced, as a single sync to signal() may affect many threads 21
Scheduling Groups – two time slicing divisions 22
Dedicated Processor Assignment – Affinitization (1/2) • When application is scheduled, each of its threads is assigned a processor that remains dedicated/affinitized to that thread until the application runs to completion • No multiprogramming of processors, i. e. one processor per a specific thread, and no other thread. • Some processors may be idle as threads may block • Eliminates context switches; with certain types of applications this can save enough time to compensate for the possible idle time penalty. • However, in a highly parallel environment with 100 of processors, utilization of processors is not a major issue, but performance is. The total avoidance of switching results in substantial speedup. 23
Dedicated Processor Assignment – Affinitization (2/2) • In Dedicated Assignment, when the number of threads exceeds the number of available processors, efficiency drops. – Due to thread preemption, context switching, suspending of other threads, and cache pollution • Notice that both gang scheduling and dedicated assignment are more concerned with allocation issues than scheduling issues. • The important question becomes "How many processors should a process be assigned? " rather than "How shall I choose the next process? " 24
Dynamic Scheduling • Threads within a process are variable • Sharing the work between OS and application. • When a job originates, it requests a certain number of processors. The OS grants some or all of the request, based on the number of processors currently available. • Then the application itself can decide which threads run when on which processors. This requires language support as would be provided with thread libraries. • When processors become free due to the termination of threads or processes, the OS can allocate them as needed to satisfy pending requests. • Simulations have shown that this approach is superior to gang scheduling and or dedicated scheduling. 25
- Multiprocessor scheduling in os
- Job scheduling vs process scheduling
- Good thought good deeds good words
- Hii good afternoon
- Good evening good morning good afternoon
- You are good when theres nothing good in me
- Cómo se dice buenas tardes
- Interprocessor arbitration
- Contoh multiprocessing
- Multiprocessors are classified as
- Dynamic multiprocessor systems.
- Multiprocessor synchronization
- Multiprocessor memory contention
- Tightly coupled multiprocessor
- Multiprocessor vs multicore
- Multiprocessor vs multicore
- Multiprocessor
- The art of multiprocessor programming exercise solutions
- Multiprocessor operating system
- Ee 126
- Art of multiprocessor programming slides
- Real-time executive for multiprocessor systems
- Dynamic interconnection network in computer architecture
- Characteristics of multiprocessor system
- Pcie-1429
- In system memory content editor