Outline OS schedulers Unix scheduling Linux 2 4

Introduction preemptive & cooperative multitasking • A multitasking operating system is one that can

UNIX Scheduling Policy • Scheduling policy determines what runs when – fast process response

Linux scheduler – Process Priority • Linux’s priority-based scheduling – Rank processes based on

Linux scheduler – Priority Ranges • Two separate priority ranges. – nice value, from

Timeslice • The timeslice is the numeric value that represents how long a task

2. 4 scheduler - SMP busy run queue busy 10

2. 4 scheduler - SMP IDLE search & estimate run queue busy 11

2. 4 scheduler - SMP busy run queue busy 12

2. 4 scheduler • Non-preemptible kernel – Set p->need_resched if schedule() should be invoked

2. 4 scheduler 1. Check if schedule() was invoked from interrupt handler (due to

2. 4 scheduler – ‘goodness’ • ‘goodness’: identifying the best candidate among all processes

2. 4 scheduler – ‘goodness’ (to improve multithreading performance) if (p->mm == prev->mm) return

2. 4 scheduler - SMP Examine the processor field of the processes and gives

Recalculating Timeslices (kernel 2. 4) • Problems: – Can take a long time. Worse,

Processes classification • Definition: – I/O-bound processes: spends much of its time submitting and

0 tq≠ 0 tq=? ? ? ity ior pr ≠ um nt nd ua

Scheduling policy time_quantumnew = bonus. I/O + timestatic time_quantumnew = time_quantumold/2 + time_quantum_table[static_priority] &

≠ um nt nd ua ou e q /O b I Tim tq=0 tq=?

2. 4 scheduler - performance • The algorithm does not scale well – It

2. 6 scheduler run queue task migration (put + pull) run queue 28

2. 6 scheduler – User Preemption • User preemption can occur – When returning

2. 6 scheduler – Kernel Preemption • The Linux kernel is a fully preemptive

Kernel Preemption • Kernel preemption can occur – When an interrupt handler exits, before

O(1) & CFS scheduler • 2. 5 ~ 2. 6. 22: O(1) scheduler –

O(1) scheduler • Implement fully O(1) scheduling. – Every algorithm in the new scheduler

The Priority Arrays • Each runqueue contains two priority arrays (defined in kernel/sched. c

The Priority Arrays • Each runqueuecontains two priority arrays (defined in kernel/sched. cas struct

Each runqueue contains two priority arrays – active and expired. � Each of these

� runqueue Linux assigns higher-priority tasks longer time-slice Time quantum ≈ 1/priority tsk 1

� runqueue Linux chooses the task with the highest priority from the active array

runqueue tsk 1 Round-robin tsk 2 tsk 3 active expired 41

runqueue tsk 1 Round-robin tsk 3 tsk 2 active expired 42

runqueue tsk 1 tsk 2 tsk 3 active expired 43

Most tasks have dynamic priorities that are based on their “nice” value (static priority)

� runqueue When all tasks have exhausted their time slices, the two priority arrays

The O(1) scheduling algorithm sched_find_first_bit() 1 1 1 tsk 3 tsk 2 46

The O(1) scheduling algorithm Insert O(1) 1 1 1 Remove O(1) find first set

find first set bit O(1) word >>= 8; static inline unsigned long __ffs (unsigned

2. 6 scheduler – CFS • The inventor of the CFS set himself a

2. 6 scheduler – CFS • Classical schedulers compute time slices for each process

vruntime • The vruntime variable stores the virtual runtime of a process, which is

updating vruntime • update_curr() calculates the execution time of the current process and stores

Process Selection • When CFS is deciding what process to run next, it picks

Adding Processes to the Tree This would occur when a process becomes runnable (wakes

2. 6 scheduler – issues • Different priority levels for tasks (i. e. ,

2. 6 scheduler – issues • time slice is a CPU timeslice that the

Scheduling policies for I/O-bound tasks • Every tasks (i. e. i/o-bound tasks) that is

2. 6 scheduler – fields in the task_struct 63

2. 6 scheduler – fields in the task_struct • prio and normal_prio indicate the

2. 6 scheduler – fields in the task_struct • cpus_allowed is a bit field

$2. 6 scheduler – priority kernel/sched. c static const int prio_to_weight[40] { /* -20$

Summary • The concept of OS schedulers • Maximize throughput. – This is what

期末報告題目 • • • task scheduling main memory management virtual memory & virtual

Slides: 70

Download presentation

Outline • • • OS schedulers Unix scheduling Linux 2. 4 scheduler Linux 2. 6 scheduler – O(1) scheduler – CFS 2

Introduction preemptive & cooperative multitasking • A multitasking operating system is one that can simultaneously interleave execution of more than one process. • Multitasking operating systems come in two flavors: cooperative multitasking and preemptive multitasking. – Linux provides preemptive multitasking – MAC OS 9 and earlier being the most notable cooperative multitasking. 3

UNIX Scheduling Policy • Scheduling policy determines what runs when – fast process response time (low latency) – maximal system utilization (high throughput) • Processes classification: – I/O-bound processes: spends much of its time submitting and waiting on I/O requests – Processor-bound processes: spend much of their time executing code • Unix variants tends to favor I/O-bound processes, thus providing good process response time 4

Linux scheduler – Process Priority • Linux’s priority-based scheduling – Rank processes based on their worth and need for processor time. – processes with a higher priority also receive a longer timeslice. – Both the user and the system may set a process's priority to influence the scheduling behavior of the system. • Dynamic priority-based scheduling – Begins with an initial base priority – Then enables the scheduler to increase or decrease the priority dynamically to fulfill scheduling objectives. – E. g. , a process that is spending more time waiting on I/O will receive an elevated dynamic priority. 5

Linux scheduler – Priority Ranges • Two separate priority ranges. – nice value, from -20 to +19 with a default of 0. • Larger nice values correspond to a lower priority. (you are being nice to the other processes on the system). – real-time priority, by default range from 0 to 99. • All real-time processes are at a higher priority than normal processes. • Linux implements real-time priorities in accordance with POSIX standards on the matter. 6

scheduler – priority 7

Timeslice • The timeslice is the numeric value that represents how long a task can run until it is pre-empted. – too short => large overhead of switching process – too long => poor interactive response • Linux’s CFS scheduler does not directly assign timeslices to processes. – CFS assigns processes a proportion of the processor. – the amount of processor time that a process receives is a function of the load of the system 8

2. 4 SCHEDULER 9

2. 4 scheduler - SMP busy run queue busy 10

2. 4 scheduler - SMP IDLE search & estimate run queue busy 11

2. 4 scheduler - SMP busy run queue busy 12

2. 4 scheduler • Non-preemptible kernel – Set p->need_resched if schedule() should be invoked at the ‘next opportunity‘ (kernel => user mode). • Round-robin – task_struct->counter: number of clock ticks left to run in this scheduling slice, decremented by a timer. 13

2. 4 scheduler 1. Check if schedule() was invoked from interrupt handler (due to a bug) and panic if so. 2. Use spin_lock_irq() to lock ‘runqueue_lock’ 3. Check if a task is ‘runnable’ – in TASK_RUNNING state – in TASK_INTERRUPTIBLE state and a signal is pending 4. Examine the ‘goodness’ of each process 5. Context switch 14

2. 4 scheduler – ‘goodness’ • ‘goodness’: identifying the best candidate among all processes in the runqueue list. – ‘goodness’ = 0: the entity has exhausted its quantum. – 0 < ‘goodness’ < 1000: the entity is a conventional process/thread that has not exhausted its quantum; a higher value denotes a higher level of goodness. 15

2. 4 scheduler – ‘goodness’ (to improve multithreading performance) if (p->mm == prev->mm) return p->counter + p->priority + 1; else return p->counter + p->priority; • A small bonus is given to the task p if it shares the address space with the previous task. 16

2. 4 scheduler - SMP Examine the processor field of the processes and gives a consistent bonus (that is PROC_CHANGE_PENALTY, usually 15) to the process that was last executed on the ‘this_cpu’ CPU. 17

Recalculating Timeslices (kernel 2. 4) • Problems: – Can take a long time. Worse, it scales O(n) for n tasks on the system. – Recalculation must occur under some sort of lock protecting the task list and the individual process descriptors. This results in high lock contention. – Nondeterminism is a problem with deterministic realtime programs. 18

Processes classification • Definition: – I/O-bound processes: spends much of its time submitting and waiting on I/O requests – Processor-bound processes: spend much of their time executing code • Linux tends to favor I/O-bound processes, thus providing good process response time • How to classify processes? 19

0 tq≠ 0 tq=? ? ? ity ior pr ≠ um nt nd ua ou e q /O b I Tim Hig h Tim (C e PU qu bo ant un um d) =0 ks tas tq=0 21

Scheduling policy time_quantumnew = bonus. I/O + timestatic time_quantumnew = time_quantumold/2 + time_quantum_table[static_priority] & dynamic_priority ≈ time_quantumnew 22

≠ um nt nd ua ou e q /O b I Tim (C e PU qu bo ant un um d) =0 tq=0 tq=? 0 tq≠ 0 tq=? ? ? 23

≠ um nt nd ua ou e q /O b I Tim (C e PU qu bo ant un um d) =0 tq=0 tq=? 0 tq≠ 0 tq=? ? ? 24

≠ um nt nd ua ou e q /O b I Tim tq=0 tq=? tq≠ 0 0 tq≠ 0 tq=? ? ? 25

2. 4 scheduler - performance • The algorithm does not scale well – It is inefficient to re-compute all dynamic priorities at once. • The predefined quantum is too large for high system loads (for example: a server) • I/O-bound process boosting strategy is not optimal – a good strategy to ensure a short response time for interactive programs, but… – some batch programs with almost no user interaction are I/O-bound. 26

2. 6 SCHEDULER 27

2. 6 scheduler run queue task migration (put + pull) run queue 28

2. 6 scheduler – User Preemption • User preemption can occur – When returning to user-space from a system call – When returning to user-space from an interrupt handler 29

2. 6 scheduler – Kernel Preemption • The Linux kernel is a fully preemptive kernel. – It is possible to preempt a task at any point, so long as the kernel is in a state in which it is safe to reschedule. – “safe to reschedule”: kernel does not hold a lock • The Linux design: – adding of a preemption counter, preempt_count, to each process's thread_info – This count increments once for each lock that is acquired and decrements once for each lock that is released • Kernel preemption can also occur explicitly, when a task in the kernel blocks or explicitly calls schedule(). – no additional logic is required to ensure that the kernel is in a state that is safe to preempt! 30

Kernel Preemption • Kernel preemption can occur – When an interrupt handler exits, before returning to kernel-space – When kernel code becomes preemptible again – If a task in the kernel explicitly calls schedule() – If a task in the kernel blocks (which results in a call to schedule()) 31

O(1) & CFS scheduler • 2. 5 ~ 2. 6. 22: O(1) scheduler – Time complexity: O(1) – Using “run queue” (an active Q and an expired Q) to realize the ready queue • 2. 6. 23~present: Completely Fair Scheduler (CFS) – Time complexity: O(log n) – the ready queue is implemented as a red-black tree 32

2. 6 SCHEDULER – O(1) 33

O(1) scheduler • Implement fully O(1) scheduling. – Every algorithm in the new scheduler completes in constant-time, regardless of the number of running processes. (Since the 2. 5 kernel). • Implement perfect SMP scalability. – Each processor has its own locking and individual runqueue. • Implement improved SMP affinity. – Attempt to group tasks to a specific CPU and continue to run them there. – Only migrate tasks from one CPU to another to resolve imbalances in runqueue sizes. • Provide good interactive performance. – Even during considerable system load, the system should react and schedule interactive tasks immediately. • Provide fairness. – No process should find itself starved of timeslice for any reasonable amount of time. Likewise, no process should receive an unfairly high amount of timeslice. • Optimize for the common case of only one or two runnable processes, yet scale well to multiple processors, each with many processes. 34

The Priority Arrays • Each runqueue contains two priority arrays (defined in kernel/sched. c as struct prio_array) – Active array: all tasks with timeslice left. – Expired array: all tasks that have exhausted their timeslice. • Priority arrays provide O(1) scheduling. – Each priority array contains one queue of runnable processors per priority level. – The priority arrays also contain a priority bitmap used to efficiently discover the highest-priority runnable task in the system. 35

The Linux O(1) scheduler algorithm 36

The Priority Arrays • Each runqueuecontains two priority arrays (defined in kernel/sched. cas struct prio_array) – Active array: all tasks with timesliceleft. – Expired array: all tasks that have exhausted their timeslice. • Priority arrays provide O(1) scheduling. – Each priority array contains one queue of runnable processors per priority level. – The priority arrays also contain a priority bitmap used to efficiently discover the highest-priority runnable task in the system. 37

Each runqueue contains two priority arrays – active and expired. � Each of these priority arrays contains a list of tasks indexed according to priority � runqueue Priority queue (0 -139) expired active 38

� runqueue Linux assigns higher-priority tasks longer time-slice Time quantum ≈ 1/priority tsk 1 tsk 2 tsk 3 active expired 39

� runqueue Linux chooses the task with the highest priority from the active array for execution. tsk 1 tsk 2 tsk 3 active expired 40

runqueue tsk 1 Round-robin tsk 2 tsk 3 active expired 41

runqueue tsk 1 Round-robin tsk 3 tsk 2 active expired 42

runqueue tsk 1 tsk 2 tsk 3 active expired 43

Most tasks have dynamic priorities that are based on their “nice” value (static priority) plus or minus 5 � Interactivity of a task ≈ 1/sleep_time � runqueue dyn. Prio = static. Prio + bonus = -5 ~ +5 bonus ≈ 1/sleep_time tsk 1 tsk 3 tsk 2 tsk 3 I/O bound active expired 44

� runqueue When all tasks have exhausted their time slices, the two priority arrays are exchanged! tsk 1 tsk 3 tsk 2 active expired 45

The O(1) scheduling algorithm sched_find_first_bit() 1 1 1 tsk 3 tsk 2 46

The O(1) scheduling algorithm Insert O(1) 1 1 1 Remove O(1) find first set bit O(1) 47

find first set bit O(1) word >>= 8; static inline unsigned long __ffs (unsigned long word) { int num = 0; #if BITS_PER_LONG == 64 if ((word & 0 xffff) == 0) { num += 32; word >>= 32; } #endif if ((word & 0 xffff) == 0) { num += 16; word >>= 16; } if ((word & 0 xff) == 0) { num += 8; } } if ((word & 0 xf) == 0) { num += 4; word >>= 4; } if ((word & 0 x 3) == 0) { num += 2; word >>= 2; } if ((word & 0 x 1) == 0) num += 1; return num; 48

2. 6 SCHEDULER - CFS 49

2. 6 scheduler – CFS • The inventor of the CFS set himself a goal of devising a scheduler capable of the fair division of available CPU power among all tasks. • If one had an ideal multitasking computer capable of concurrent execution on N processes then every process would get exactly 1/N-th of its available CPU power. 50

2. 6 scheduler – CFS • Classical schedulers compute time slices for each process in the system and allow them to run until their time slice/quantum is used up. – After that, all process need to be recalculated. • CFS considers only the wait time of a process – The task with the most need for CPU time is scheduled. 51

2. 6 SCHEDULER - CFS 52

2. 6 scheduler – CFS • The inventor of the CFS set himself a goal of devising a scheduler capable of the fair devision of available CPU power among all tasks. • If one had an ideal multitasking computer capable of concurrent execution on N processes then every process would get exactly 1/N-th of its available CPU power. 53

2. 6 scheduler – CFS 55

vruntime • The vruntime variable stores the virtual runtime of a process, which is the actual runtime normalized by the number of runnable processes. • The virtual runtime’s units are nanoseconds and therefore vruntime is decoupled from the timer tick. • The virtual runtime is used to help us approximate the “ideal multitasking processor” that CFS is modeling. 56

updating vruntime • update_curr() calculates the execution time of the current process and stores that value in delta_exec. • It then passes that runtime to __update_curr(), which weights the time by the number of runnable processes. 57

Process Selection • When CFS is deciding what process to run next, it picks the process with the smallest vruntime. • CFS uses a red-black tree to manage the list of runnable processes and efficiently find the process with the smallest vruntime. 58

Adding Processes to the Tree This would occur when a process becomes runnable (wakes up) or is first created via fork(). se->vruntime += cfs_rq->min_vruntime; update_curr(cfs_rq); account_entity_enqueue(cfs_rq, se); update_stats_enqueue(cfs_rq, se); __enqueue_entity(cfs_rq, se); 59

2. 6 scheduler – issues • Different priority levels for tasks (i. e. , nice values) must be taken into account • Tasks must not be switched too often because a context switch has a certain overhead. 60

2. 6 scheduler – issues • time slice is a CPU timeslice that the task deserves, • period is the epoch length • task_load – the weighted task loading • cfs rq load – the weight of the fair queue. 61

Scheduling policies for I/O-bound tasks • Every tasks (i. e. i/o-bound tasks) that is waken up gets the virtual runtime equal to the smallest virtual runtime among the tasks in the queue. – OR smallest_virtual - epsilon • Run the i/o tasks first 62

2. 6 scheduler – fields in the task_struct 63

2. 6 scheduler – fields in the task_struct • prio and normal_prio indicate the dynamic priorities, static_prio the static priority of a process. – The static priority is the priority assigned to the process when it was started. – The normal_prio & prio denote a priority that is computed based on the static priority and the scheduling policy of the process. 64

2. 6 scheduler – fields in the task_struct • cpus_allowed is a bit field used on multiprocessor systems to restrict the CPUs on which a process may run. – setaffinity() – getaffinity() 65

2. 6 scheduler – priority 66

$2. 6 scheduler – priority kernel/sched. c static const int prio_to_weight[40] { /* -20$

2. 6 scheduler – priority kernel/sched. c static const int prio_to_weight[40] { /* -20 */ /* -15 */ /* -10 */ /* -5 */ /* 0 */ 88761, 29154, 9548, 3121, 1024, 71755, 23254, 7620, 2501, 820, 56483, 18705, 6100, 1991, 655, 46273, 14949, 4904, 1586, 526, 36291, 11916, 3906, 1277, 423, /* /* 5 */ 10 */ 335, 110, 272, 87, 215, 172, 137, /* 15 */ 36, 29, }; = 70, 56, 45, 23, 18, 15, 67

2. 6 scheduler – priority 68

Summary • The concept of OS schedulers • Maximize throughput. – This is what system administrators care about. – How to maximize throughput (CPU & I/O). • What is the major drawback of Linux 2. 4 scheduler • To understand the pros and cons of Linux 2. 6 schedulers – O(1) – CFS 69

期末報告題目 • • • task scheduling main memory management virtual memory & virtual memory space virtual file system & btrfs interrupt service routine & device drivers virtual machine (software approaches or hardware approaches) 70