Outline OS schedulers Unix scheduling Linux 2 4

Introduction preemptive & cooperative multitasking • A multitasking operating system is one that can

Linux scheduler – Scheduling Policy • Scheduling policy determines what runs when – fast

Linux scheduler – Process Priority • Linux’s priority-based scheduling – Rank processes based on

Linux scheduler – Priority Ranges • Two separate priority ranges. – nice value, from

2. 4 scheduler • Non-preemptible kernel – Set p->need_resched if schedule() should be invoked

2. 4 scheduler 1. Check if schedule() was invoked from interrupt handler (due to

2. 4 scheduler – ‘goodness’ • ‘goodness’: identifying the best candidate among all processes

2. 4 scheduler – ‘goodness’ if (p->mm == prev->mm) return p->counter + p->priority +

2. 4 scheduler - SMP Examine the processor field of the processes and gives

2. 4 scheduler - performance • The algorithm does not scale well – It

Recalculating Timeslices (kernel 2. 4) • Problems: – Can take a long time. Worse,

Processes classification • Definition: – I/O-bound processes: spends much of its time submitting and

0 tq≠ 0 tq=? ? ? ity ior pr ≠ um nt nd ua

Scheduling policy time_quantumnew = time_quantumold/2 + time_quantum_table[static_priority] & dynamic_priority ≈ time_quantumnew 18

2. 6 scheduler run queue task migration (put + pull) run queue 22

2. 6 scheduler – User Preemption • User preemption can occur – When returning

2. 6 scheduler – Kernel Preemption • The Linux kernel is a fully preemptive

Kernel Preemption • Kernel preemption can occur – When an interrupt handler exits, before

O(1) & CFS scheduler • 2. 5 ~ 2. 6. 22: O(1) scheduler –

O(1) scheduler • Implement fully O(1) scheduling. – Every algorithm in the new scheduler

The Priority Arrays • Each runqueuecontains two priority arrays (defined in kernel/sched. c as

The Priority Arrays • Each runqueuecontains two priority arrays (defined in kernel/sched. cas struct

Each runqueue contains two priority arrays – active and expired. � Each of these

� runqueue Linux assigns higher-priority tasks longer time-slice Time quantum ≈ 1/priority tsk 1

� runqueue Linux chooses the task with the highest priority from the active array

runqueue tsk 1 Round-robin tsk 2 tsk 3 active expired 34

runqueue tsk 1 Round-robin tsk 3 tsk 2 active expired 35

runqueue tsk 1 tsk 2 tsk 3 active expired 36

Most tasks have dynamic priorities that are based on their “nice” value (static priority)

� runqueue When all tasks have exhausted their time slices, the two priority arrays

The O(1) scheduling algorithm sched_find_first_bit() 1 1 1 tsk 3 tsk 2 39

The O(1) scheduling algorithm Insert O(1) 1 1 1 Remove O(1) find first set

find first set bit O(1) word >>= 8; static inline unsigned long __ffs (unsigned

2. 6 scheduler – CFS • Classical schedulers compute time slices for each process

2. 6 scheduler – CFS (motivation) • Traditional Linux scheduling policy HP LP HP:

2. 6 scheduler – CFS (motivation) • Traditional Unix scheduling policy WT WT WT:

2. 6 scheduler – CFS (the idea case) task task virtualization High priority Low

2. 6 scheduler – CFS (the idea case) task task virtualization fastly slowly 47

2. 6 scheduler – CFS (the idea case) task 8　: task 3　: task 3

time 2. 6 scheduler – CFS (the idea case) 8 8 8 3 3

time 2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 56

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 57

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 58

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 59

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 60

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 61

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 62

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 63

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 64

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 65

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 66

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 67

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 68

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 69

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 70

2. 6 scheduler – the RB-Tree • To sort tasks on the red-black tree,

2. 6 scheduler – issues • Different priority levels for tasks (i. e. ,

2. 6 scheduler – fields in the task_struct 74

2. 6 scheduler – fields in the task_struct • prio and normal_prio indicate the

2. 6 scheduler – fields in the task_struct • The scheduler is not limited

2. 6 scheduler – fields in the task_struct • cpus_allowed is a bit field

$2. 6 scheduler – priority kernel/sched. c static const int prio_to_weight[40] = { /*$

Summary • The concept of OS schedulers • Maximize throughput. – This is what

Slides: 81

Download presentation

Outline • • • OS schedulers Unix scheduling Linux 2. 4 scheduler Linux 2. 6 scheduler – O(1) scheduler – O(2) scheduler 2

Introduction preemptive & cooperative multitasking • A multitasking operating system is one that can simultaneously interleave execution of more than one process. • Multitasking operating systems come in two flavors: cooperative multitasking and preemptive multitasking. – Linux provides preemptive multitasking – MAC OS 9 and earlier being the most notable cooperative multitasking. 3

Linux scheduler – Scheduling Policy • Scheduling policy determines what runs when – fast process response time (low latency) – maximal system utilization (high throughput) • Processes classification: – I/O-bound processes: spends much of its time submitting and waiting on I/O requests – Processor-bound processes: spend much of their time executing code • Unix variants tends to favor I/O-bound processes, thus providing good process response time 4

Linux scheduler – Process Priority • Linux’s priority-based scheduling – Rank processes based on their worth and need for processor time. – Both the user and the system may set a process's priority to influence the scheduling behavior of the system. • Dynamic priority-based scheduling – Begins with an initial base priority – Then enables the scheduler to increase or decrease the priority dynamically to fulfill scheduling objectives. – E. g. , a process that is spending more time waiting on I/O will receive an elevated dynamic priority. 5

Linux scheduler – Priority Ranges • Two separate priority ranges. – nice value, from -20 to +19 with a default of 0. • Larger nice values correspond to a lower priority. (you are being nice to the other processes on the system). – real-time priority, by default range from 0 to 99. • All real-time processes are at a higher priority than normal processes. • Linux implements real-time priorities in accordance with POSIX standards on the matter. 6

2. 4 scheduler • Non-preemptible kernel – Set p->need_resched if schedule() should be invoked at the ‘next opportunity‘ (kernel => user mode). • Round-robin – task_struct->counter: number of clock ticks left to run in this scheduling slice, decremented by a timer. 7

2. 4 scheduler 1. Check if schedule() was invoked from interrupt handler (due to a bug) and panic if so. 2. Use spin_lock_irq() to lock ‘runqueue_lock’ 3. Check if a task is ‘runnable’ – in TASK_RUNNING state – in TASK_INTERRUPTIBLE state and a signal is pending 4. Examine the ‘goodness’ of each process 5. Context switch 8

2. 4 scheduler – ‘goodness’ • ‘goodness’: identifying the best candidate among all processes in the runqueue list. – ‘goodness’ = 0: the entity has exhausted its quantum. – 0 < ‘goodness’ < 1000: the entity is a conventional process/thread that has not exhausted its quantum; a higher value denotes a higher level of goodness. 9

2. 4 scheduler – ‘goodness’ if (p->mm == prev->mm) return p->counter + p->priority + 1; else return p->counter + p->priority; • A small bonus is given to the task p if it shares the address space with the previous task. 10

2. 4 scheduler - SMP run queue 11

2. 4 scheduler - SMP Examine the processor field of the processes and gives a consistent bonus (that is PROC_CHANGE_PENALTY, usually 15) to the process that was last executed on the ‘this_cpu’ CPU. 12

2. 4 scheduler - performance • The algorithm does not scale well – It is inefficient to re-compute all dynamic priorities at once. • The predefined quantum is too large for high system loads (for example: a server) • I/O-bound process boosting strategy is not optimal – a good strategy to ensure a short response time for interactive programs, but… – some batch programs with almost no user interaction are I/O-bound. 13

Recalculating Timeslices (kernel 2. 4) • Problems: – Can take a long time. Worse, it scales O(n) for n tasks on the system. – Recalculation must occur under some sort of lock protecting the task list and the individual process descriptors. This results in high lock contention. – Nondeterminism is a problem with deterministic realtime programs. 14

Processes classification • Definition: – I/O-bound processes: spends much of its time submitting and waiting on I/O requests – Processor-bound processes: spend much of their time executing code • Linux tends to favor I/O-bound processes, thus providing good process response time • How to classify processes? 15

0 tq≠ 0 tq=? ? ? ity ior pr ≠ um nt nd ua ou e q /O b I Tim Hig h Tim (C e PU qu bo ant un um d) =0 ks tas tq=0 17

Scheduling policy time_quantumnew = time_quantumold/2 + time_quantum_table[static_priority] & dynamic_priority ≈ time_quantumnew 18

≠ um nt nd ua ou e q /O b I Tim (C e PU qu bo ant un um d) =0 tq=0 tq=? 0 tq≠ 0 tq=? ? ? 19

≠ um nt nd ua ou e q /O b I Tim (C e PU qu bo ant un um d) =0 tq=0 tq=? 0 tq≠ 0 tq=? ? ? 20

≠ um nt nd ua ou e q /O b I Tim (C e PU qu bo ant un um d) =0 tq≠ 0 tq=0 tq=? 0 tq≠ 0 tq=? ? ? 21

2. 6 scheduler run queue task migration (put + pull) run queue 22

2. 6 scheduler – User Preemption • User preemption can occur – When returning to user-space from a system call – When returning to user-space from an interrupt handler 23

2. 6 scheduler – Kernel Preemption • The Linux kernel is a fully preemptive kernel. – It is possible to preempt a task at any point, so long as the kernel is in a state in which it is safe to reschedule. – “safe to reschedule”: kernel does not hold a lock • The Linux design: – additing of a preemption counter, preempt_count, to each process's thread_info – This count increments once for each lock that is acquired and decrements once for each lock that is released • Kernel preemption can also occur explicitly, when a task in the kernel blocks or explicitly calls schedule(). – no additional logic is required to ensure that the kernel is in a state that is safe to preempt! 24

Kernel Preemption • Kernel preemption can occur – When an interrupt handler exits, before returning to kernel-space – When kernel code becomes preemptible again – If a task in the kernel explicitly calls schedule() – If a task in the kernel blocks (which results in a call to schedule()) 25

O(1) & CFS scheduler • 2. 5 ~ 2. 6. 22: O(1) scheduler – Time complexity: O(1) – Using “run queue” (an active Q and an expired Q) to realize the ready queue • 2. 6. 23~present: Completely Fair Scheduler (CFS) – Time complexity: O(log n) – the ready queue is implemented as a red-black tree 26

O(1) scheduler • Implement fully O(1) scheduling. – Every algorithm in the new scheduler completes in constant-time, regardless of the number of running processes. (Since the 2. 5 kernel). • Implement perfect SMP scalability. – Each processor has its own locking and individual runqueue. • Implement improved SMP affinity. – Attempt to group tasks to a specific CPU and continue to run them there. – Only migrate tasks from one CPU to another to resolve imbalances in runqueue sizes. • Provide good interactive performance. – Even during considerable system load, the system should react and schedule interactive tasks immediately. • Provide fairness. – No process should find itself starved of timeslice for any reasonable amount of time. Likewise, no process should receive an unfairly high amount of timeslice. • Optimize for the common case of only one or two runnable processes, yet scale well to multiple processors, each with many processes. 27

The Priority Arrays • Each runqueuecontains two priority arrays (defined in kernel/sched. c as struct prio_array) – Active array: all tasks with timeslice left. – Expired array: all tasks that have exhausted their timeslice. • Priority arrays provide O(1) scheduling. – Each priority array contains one queue of runnable processors per priority level. – The priority arrays also contain a priority bitmap used to efficiently discover the highest-priority runnable task in the system. 28

The Linux O(1) scheduler algorithm 29

The Priority Arrays • Each runqueuecontains two priority arrays (defined in kernel/sched. cas struct prio_array) – Active array: all tasks with timesliceleft. – Expired array: all tasks that have exhausted their timeslice. • Priority arrays provide O(1) scheduling. – Each priority array contains one queue of runnable processors per priority level. – The priority arrays also contain a priority bitmap used to efficiently discover the highest-priority runnable task in the system. 30

Each runqueue contains two priority arrays – active and expired. � Each of these priority arrays contains a list of tasks indexed according to priority � runqueue Priority queue (0 -139) expired active 31

� runqueue Linux assigns higher-priority tasks longer time-slice Time quantum ≈ 1/priority tsk 1 tsk 2 tsk 3 active expired 32

� runqueue Linux chooses the task with the highest priority from the active array for execution. tsk 1 tsk 2 tsk 3 active expired 33

runqueue tsk 1 Round-robin tsk 2 tsk 3 active expired 34

runqueue tsk 1 Round-robin tsk 3 tsk 2 active expired 35

runqueue tsk 1 tsk 2 tsk 3 active expired 36

Most tasks have dynamic priorities that are based on their “nice” value (static priority) plus or minus 5 � Interactivity of a task ≈ 1/sleep_time � runqueue dyn. Prio = static. Prio + bonus = -5 ~ +5 bonus ≈ 1/sleep_time tsk 1 tsk 3 tsk 2 tsk 3 I/O bound active expired 37

� runqueue When all tasks have exhausted their time slices, the two priority arrays are exchanged! tsk 1 tsk 3 tsk 2 active expired 38

The O(1) scheduling algorithm sched_find_first_bit() 1 1 1 tsk 3 tsk 2 39

The O(1) scheduling algorithm Insert O(1) 1 1 1 Remove O(1) find first set bit O(1) 40

find first set bit O(1) word >>= 8; static inline unsigned long __ffs (unsigned long word) { int num = 0; #if BITS_PER_LONG == 64 if ((word & 0 xffff) == 0) { num += 32; word >>= 32; } #endif if ((word & 0 xffff) == 0) { num += 16; word >>= 16; } if ((word & 0 xff) == 0) { num += 8; } } if ((word & 0 xf) == 0) { num += 4; word >>= 4; } if ((word & 0 x 3) == 0) { num += 2; word >>= 2; } if ((word & 0 x 1) == 0) num += 1; return num; 41

2. 6 scheduler – CFS • Classical schedulers compute time slices for each process in the system and allow them to run until their time slice/quantum is used up. – After that, all process need to be recalculated. • CFS considers only the wait time of a process – The task with the most need for CPU time is scheduled. 42

2. 6 scheduler – CFS • 43

2. 6 scheduler – CFS (motivation) • Traditional Linux scheduling policy HP LP HP: high priority LP: low priority 44

2. 6 scheduler – CFS (motivation) • Traditional Unix scheduling policy WT WT WT: waiting time 45

2. 6 scheduler – CFS (the idea case) task task virtualization High priority Low priority 46

2. 6 scheduler – CFS (the idea case) task task virtualization fastly slowly 47

2. 6 scheduler – CFS (the idea case) task 8　: task 3　: task 3 speed 48

time 2. 6 scheduler – CFS (the idea case) 8 8 8 3 3 3 49

time 2. 6 scheduler – CFS (the idea case) 8 8 8 3 3 3 50

time 2. 6 scheduler – CFS (the idea case) 8 8 8 3 3 3 51

time 2. 6 scheduler – CFS (the idea case) 8 8 8 3 3 3 52

time 2. 6 scheduler – CFS (the idea case) 8 8 8 3 3 3 53

time 2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 54

time 2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 55

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 56

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 57

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 58

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 59

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 60

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 61

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 62

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 63

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 64

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 65

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 66

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 67

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 68

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 69

2. 6 scheduler – CFS (the implementation) 8 8 8 3 3 3 70

2. 6 scheduler – CFS 71

2. 6 scheduler – the RB-Tree • To sort tasks on the red-black tree, the kernel uses the difference fair_clock -wait_runtime. – While fair_clock is a measure for the CPU time a task would have gotten if scheduling were completely fair, – wait_runtime is a direct measure for the unfairness caused by the imperfection of real systems. 72

2. 6 scheduler – issues • Different priority levels for tasks (i. e. , nice values) must be taken into account • Tasks must not be switched too often because a context switch has a certain overhead. 73

2. 6 scheduler – fields in the task_struct 74

2. 6 scheduler – fields in the task_struct • prio and normal_prio indicate the dynamic priorities, static_prio the static priority of a process. – The static priority is the priority assigned to the process when it was started. – normal_priority denotes a priority that is computed based on the static priority and the scheduling policy of the process. 75

2. 6 scheduler – fields in the task_struct • The scheduler is not limited to schedule processes, but can also work with larger entities. This allows for implementing group scheduling. 76

2. 6 scheduler – fields in the task_struct • cpus_allowed is a bit field used on multiprocessor systems to restrict the CPUs on which a process may run. – setaffinity() – getaffinity() 77

2. 6 scheduler – priority 78

$2. 6 scheduler – priority kernel/sched. c static const int prio_to_weight[40] = { /*$

2. 6 scheduler – priority kernel/sched. c static const int prio_to_weight[40] = { /* -20 */ 88761, 71755, 56483, 46273, 36291, /* -15 */ 29154, 23254, 18705, 14949, 11916, /* -10 */ 9548, 7620, 6100, 4904, 3906, /* -5 */ 3121, 2501, 1991, 1586, 1277, /* 0 */ 1024, 820, 655, 526, 423, /* 5 */ 335, 272, 215, 172, 137, /* 10 */ 110, 87, 70, 56, 45, /* 15 */ 36, 29, 23, 18, 15, }; 79

2. 6 scheduler – priority 80

Summary • The concept of OS schedulers • Maximize throughput. – This is what system administrators care about. – How to maximize throughput (CPU & I/O). • What is the major drawback of Linux 2. 4 scheduler • Pros and cons of Linux 2. 6 schedulers – O(1) – CFS 81