RealTime Scheduling Analysis for Multiprocessor Platforms Marko Bertogna
Real-Time Scheduling Analysis for Multiprocessor Platforms Marko Bertogna Ph. D dissertation Scuola Superiore S. Anna, Pisa, Italy
Overview n n n The Multicore Revolution Real-Time Multiprocessor Systems: existing results Schedulability Analysis for global schedulers Experimental evaluation Conclusions Other research activities 19/05/2008 Marko Bertogna - Ph. D dissertation 2
Main Contributions n n Systematization of existing results for RT scheduling and schedulability analysis on MP Polynomial and pseudo-polynomial schedulability tests for n n n Work-conserving schedulers FP EDF EDZL Experimental comparison of existing techniques 19/05/2008 Marko Bertogna - Ph. D dissertation 3
Real-Time Systems n Solid theory of single processor systems n n Much less results for multiprocessors n n Optimal schedulers, tight schedulability tests, shared resource protocols, bandwidth reservation schemes, hierarchical schedulers, OS, etc. Many NP-hard problems, few optimal results, heuristic approaches, simplified task models, only sufficient schedulability tests, etc. Do we really need to investigate Multi. Processors Real-Time Systems? 19/05/2008 Marko Bertogna - Ph. D dissertation 4
As Moore’s law goes on… n Number of transistor/chip doubles every 18 to 24 mm months 19/05/2008 Marko Bertogna - Ph. D dissertation 5
…heating becomes a problem n P V f: Clock speed limited to less than 4 GHz Pentium Tejas cancelled! Power (W) P 3 P 4 Pentium P 2 P 1 Nuclear Reactor STOP Hot-plate 286 486 8086 386 8085 8080 8008 4004 Year 19/05/2008 Marko Bertogna - Ph. D dissertation 6
Solution Use a higher number of slower logic gates Denser chips with transistor operating at lower frequencies MULTICORE SYSTEMS 19/05/2008 Marko Bertogna - Ph. D dissertation 7
The Multicore invasion n n n n Intel’s Core 2, Itanium, Xeon: 2, 4 cores AMD’s Opteron, Athlon 64 X 2, Phenom: 2, 4 cores IBM-Toshiba-Sony Cell processor: 8 cores (PSX 3) Microsoft’s Xenon: 3 cores (Xbox 360) ARM’s MPCore: 4 cores Sun’s Niagara Ultra. SPARC: 8 cores Tilera’s TILE 64: 64 -core Nios II: x soft Cores TI, Freescale, Atmel, Broadcom, Picochip (pico. Array up to 300 DSP cores), . . . 19/05/2008 Marko Bertogna - Ph. D dissertation 8
Identical vs heterogenous cores ARM’s MPCore • 4 identical ARMv 6 cores 19/05/2008 STI’s Cell Processor • One Power Processor Element (PPE) • 8 Synergistic Processing Element (SPE) Marko Bertogna - Ph. D dissertation 9
System model n n Platform with m identical processors Task set t with n periodic or sporadic tasks ti n n Period or minimum inter-arrival time Ti Worst-case execution time Ci Deadline Di Utilization Ui=Ci/Ti, density li=Ci/min(Di, Ti) 19/05/2008 Marko Bertogna - Ph. D dissertation 10
Problems addressed n n Run-time scheduling problem Schedulability problem t 1 CPU 1 t 2 t 3 ? CPU 2 t 4 t 5 19/05/2008 CPU 3 Marko Bertogna - Ph. D dissertation 11
Assumptions n n Independent tasks Job-level parallelism prohibited n n Preemption and Migration support n n the same job cannot be contemporarily executed on more than one processor a preempted task can resume its execution on a different processor Cost of preemption/migration integrated into task WCET 19/05/2008 Marko Bertogna - Ph. D dissertation 12
Global vs partitioned scheduling n Single system-wide queue or multiple perprocessor queues: Global scheduler t 5 t 4 t 3 t 2 t 1 19/05/2008 Partitioned scheduler CPU 1 t 5 t 4 t 1 CPU 1 t 1 CPU 2 t 3 t 2 CPU 2 t 2 CPU 3 t 3 Marko Bertogna - Ph. D dissertation CPU 3 13
Partitioned Scheduling n The scheduling problem reduces to: Bin-packing problem NP-hard in the strong sense + Uniprocessor scheduling problem t 2 t 3 t 4 t 5 Well known Various heuristics used: FF, NF, BF, FFDU, BFDD, etc. n t 1 EDF Utot ≤ 1 RM (RTA) . . . Global (work-conserving) and partitioned approaches are incomparable 19/05/2008 Marko Bertogna - Ph. D dissertation 14
Global scheduling n n The m highest priority ready jobs are always the one executing Work-conserving scheduler n No processor is ever idled when a task is ready to execute. t 5 19/05/2008 t 4 t 3 t 2 t 1 CPU 1 t 1 CPU 2 t 2 CPU 3 t 3 Marko Bertogna - Ph. D dissertation 15
Global scheduling: advantages Load automatically balanced Easier re-scheduling (dynamic loads, selective shutdown, etc. ) Lower average response time (see queueing theory) More efficient reclaiming and overload management Number of preemptions Migration cost: can be mitigated by proper HW (e. g. , MPCore’s Direct Data Intervention) Few schedulability tests Further research needed 19/05/2008 Marko Bertogna - Ph. D dissertation 16
Uniprocessor scheduling n n EDF optimal for arbitrary job collections Exact schedulability conditions n n n Optimal priority assignments for sporadic and synchronous periodic task systems n n n linear test for implicit deadlines: Utot ≤ 1 Pseudo-polynomial test for constrained and arbitrary deadlines [Baruah et al. 90] RM for implicit deadlines DM for constrained deadlines Exact pseudo-polynomial schedulability test for FP n Response Time Analysis (RTA) 19/05/2008 Marko Bertogna - Ph. D dissertation 17
Global Scheduling n n No optimal scheduler known for general task models Pfair optimal for implicit deadlines: Utot ≤ m n n preemption and synchronization issues Classic schedulers are not optimal (Dhall’s effect): m light tasks 1 heavy task Utot 1 n Hybrid schedulers: EDF-US, RM-US, DM-DS, Adaptive. Tk. C, fp. EDF, EDF(k), EDZL, … 19/05/2008 Marko Bertogna - Ph. D dissertation 18
Global scheduling: main results n n Only sufficient schedulability tests Utilization-based tests (implicit deadlines) n n Polynomial tests n n n EDF Goossens et al. : Utot ≤ m(1 -Umax)+Umax fp. EDF Baruah: Utot ≤ (m+1)/2 RM-US Andersson et al. : Utot ≤ m 2/(3 m-2) EDF, FP Baker: O(n 2) and O(n 3) tests EDZL Cirinei, Baker: O(n 2) test Pseudo-polynomial tests n EDF, FP Fisher, Baruah: load-based tests 19/05/2008 Marko Bertogna - Ph. D dissertation 19
Density-based tests n n EDF: ltot ≤ m(1 -lmax)+lmax EDF-DS[1/2]: ltot ≤ (m+1)/2 [ECRTS’ 05] Gives highest priority to (at most m-1) tasks having lt ≥ 1/2, and schedules the remaining ones with EDF n n DM: ltot ≤ m(1–lmax)/2+lmax DM-DS[1/3]: ltot ≤ (m+1)/3 [OPODIS’ 05] Gives highest priority to (at most m-1) tasks having lt ≥ 1/3, and schedules the remaining ones with DM (only constrained deadlines) 19/05/2008 Marko Bertogna - Ph. D dissertation 20
Critical instant n n n A particular configuration of releases that leads to the largest possible response time of a task. Possible to derive exact schedulability tests analyzing just the critical instant situation. Uniprocessor FP and EDF: a critical instant is when n all tasks arrive synchronously all jobs are released as soon as permitted Response Time Analysis for uniprocessors n FP the response time of task k is given by the fixed point of Rk in the iteration 19/05/2008 Marko Bertogna - Ph. D dissertation 21
Multiprocessor anomaly n Synchronous periodic arrival of jobs is not a critical instant for multiprocessors: t 1 = (1, 1, 2) t 2 = (1, 1, 3) t 3 = (5, 6, 6) Second job periodic of t 2 Synchronous delayed situation by one unit from [Bar 07] Need to find pessimistic situations to derive sufficient schedulability tests 19/05/2008 Marko Bertogna - Ph. D dissertation 22
Introducing the interference Ik = Total interference suffered by task tk Iki = Interference of task ti on task tk CPU 3 CPU 2 CPU 1 tk Ik 3 Ik 2 Ik 4 Ik 1 Ik 5 Ik 3 tk Ik 3 Ik 6 Ik 5 Ik 2 Ik 7 Ik 8 rk+Rk rk 19/05/2008 tk Marko Bertogna - Ph. D dissertation 23
Limiting the interference It is sufficient to consider at most the portion (Rk-Ck+1) of each term Iik in the sum CPU 3 CPU 2 CPU 1 tk Ik 3 Ik 2 Ik 4 Ik 1 Ik 5 Ik 3 tk Ik 3 Ik 6 Ik 5 Ik 2 Ik 7 Ik 8 tk rk+Rk rk It can be proved that WCRTk is given by the fixed point of: 19/05/2008 Marko Bertogna - Ph. D dissertation 24
Bounding the interference Exactly computing the interference is complex 1. Pessimistic assumptions: Bound the interference of a task with the workload: 2. Use an upper bound on the workload. 19/05/2008 Marko Bertogna - Ph. D dissertation 25
Bounding the workload Consider a situation in which: n n The first job executes as close as possible to its deadline Successive jobs execute as soon as possible Di Ci Ti Ci Ci εi Ci L (# jobs excluded the last one) where: (last job) 19/05/2008 Marko Bertogna - Ph. D dissertation 26
RTA for generic global schedulers n An upper bound on the WCRT of task k is given by the fixed point of Rk in the iteration: Rk n Sk The slack of task k is at least: 19/05/2008 Marko Bertogna - Ph. D dissertation 27
Improvement using slack values Consider a situation in which: n n The first job executes as close as possible to its deadline Successive jobs execute as soon as possible Di Ci Ti Ci Ci εi Ci L (# jobs excluded the last one) where: (last job) 19/05/2008 Marko Bertogna - Ph. D dissertation 28
Improvement using slack values Consider a situation in which: n n The first job executes as close as possible to its deadline Successive jobs execute as soon as possible Si Ci Di Ti Ci Ci Ci L where: 19/05/2008 Marko Bertogna - Ph. D dissertation 29
RTA for generic global schedulers n An upper bound on the WCRT of task k is given by the fixed point of Rk in the iteration: 1. 2. If a fixed point Rk ≤ Dk is reached for every task k in the system, the task set is schedulable with any workconserving global scheduler. 19/05/2008 Marko Bertogna - Ph. D dissertation 30
Iterative schedulability test 1. 2. All slacks initialized to zero Compute slack lower bound for tasks 1, …, n n n 3. 4. 5. if higher than old value update slack bound If lower, do nothing If all tasks have a positive slack lower bound return success If no slack has been updated for tasks 1, …, n return fail Otherwise, return to point 2 19/05/2008 Marko Bertogna - Ph. D dissertation 31
RTA refinement for Fixed Priority n n The interference on higher priority tasks is always null: An upper bound on the WCRT of task k can be given by the fixed point of Rk in the iteration: 1. 2. 19/05/2008 Marko Bertogna - Ph. D dissertation 32
RTA refinement for EDF n A different bound can be derived analyzing the worst-case workload in a situation in which: n n n The interfering and interfered tasks have a common deadline All jobs execute as late as possible An upper bound on the WCRT of task k is given by the fixed point of Rk in the iteration: 1. 2. 19/05/2008 Marko Bertogna - Ph. D dissertation 33
Complexity n n Pseudo-polynomial complexity Fast average behavior n n Lower complexity for Fixed Priority systems n n We verified the schedulability of millions of task sets in a few minutes on a normal device. at most one slack update per task, if slacks are updated in decreasing priority order. Possible to reduce complexity limiting the number of rounds 19/05/2008 Marko Bertogna - Ph. D dissertation 34
Polynomial complexity test n n n A simpler test can be derived avoiding the iterations on the response times A lower bound on the slack of tk is given by: The iteration on the slack values is the same Performances comparable to RTA-based test Complexity down to O(n 2) 19/05/2008 Marko Bertogna - Ph. D dissertation 35
Experimental results for EDF task sets Total task sets I-BCL EDF Goossens et al. ’ 03 Baker et al. ’ 07 Bertogna et al. ’ 05 generated • 2 processors task sets • Constrained deadlines our test Improvement over existing solutions 19/05/2008 Bertogna - Ph. D dissertation Task. Marko set utilization • 1. 000 task sets generated • Our test is constantly superior at all utilizations 36
Experimental results for FP task sets Total task sets I-BCL FP Bertogna et al. ’ 05 Baker et al. ’ 07 Density bound generated • 2 processors task sets • Constrained deadlines our test • 1. 000 task sets generated • Our test is constantly superior at all utilizations 19/05/2008 Bertogna - Ph. D dissertation Task. Marko set utilization 37
FP vs EDF task sets Total task sets I-BCL FP Baker et al. ’ 07 I-BCL EDF Goossens et al. ’ 03 generated • 4 processors task sets • Constrained deadlines our FP test our EDF test • 1. 000 task sets generated • our FP test is constantly superior to all tests at every utilization 19/05/2008 Bertogna - Ph. D dissertation Task. Marko set utilization 38
Conclusions n n Multiprocessor Real-Time systems are a promising field to explore. Still few existing results far from tight conditions. We contributed filling this gap. Future work: n n Find tighter schedulability tests. Use our techniques to analyze the efficiency of other scheduling algorithms (EDZL, EDF-US, FP-DS, etc). Take into account exclusive resources access. Integrate into Resource Reservation framework. 19/05/2008 Marko Bertogna - Ph. D dissertation 39
The end 19/05/2008 Marko Bertogna - Ph. D dissertation 40
Other research activities n n n Limited-preemption EDF Reducing Resource Holding Times Shared resources and open environments 19/05/2008 Marko Bertogna - Ph. D dissertation 41
ARM’s MPcore 19/05/2008 Marko Bertogna - Ph. D dissertation 42
Frequency and power n n f = operating frequency V = supply voltage (V~=0. 3+0. 7 f) n n n Ileak = leakage current (becomes non-negligible) P = Pdynamic + Pstatic = power consumed n n n Reducing the voltage causes a higher frequency reduction Pdynamic ACV 2 f (main contributor until hundreds nm) Pstatic VIleak (always present, due to subthreshold and gate-oxide leakage) Reducing V allows a quadratic reduction of Pdynamic 19/05/2008 Marko Bertogna - Ph. D dissertation 43
Power density Power Density (W/cm 2) 10000 Rocket Nozzle 1000 Nuclear Reactor 100 8086 10 4004 Hot Plate P 6 8008 8085 Pentium® proc 386 286 486 8080 1 1970 19/05/2008 1980 1990 Year 2000 Marko Bertogna - Ph. D dissertation 2010 44
How many cores in the future? n Intel’s 80 core prototype already available n n 19/05/2008 Able to transfers a TB of data/s (Core 2 Duo reaches 1. 66 GB data/s) To be released in 5 years Marko Bertogna - Ph. D dissertation 45
Beyond 2 billion transistors/chip n Intel’s Tukwila n n n n Itanium based 2. 046 B FET Quad-core 65 nm technology 2 GHz on 170 W 30 MB cache 2 SMT 8 threads/ck 19/05/2008 Marko Bertogna - Ph. D dissertation 46
Intel’s timeline 19/05/2008 Marko Bertogna - Ph. D dissertation 47
n From 4004 (1971) to Pentium D (2005): n n n n 10 um 65 nm : 100 k. Hz 3 GHz: 2. 300 291. 000: 0. 2 W 100 W: 150 x 25000 x 125. 000 x 500 x Vdd reduced (from 5 V to ~1 V) Not all MOS change state n n Tech: f: # MOS: P: Great part of chip occupied by cache f Vdd-Vtt Ileak Vdd, 1/Vtt 19/05/2008 Marko Bertogna - Ph. D dissertation 48
Intel 4004 (1971) 19/05/2008 Intel Pentium IV (2000) Marko Bertogna - Ph. D dissertation 49
Itanium temperature plot 19/05/2008 Marko Bertogna - Ph. D dissertation 50
Problems addressed n n Run-time scheduling problem Schedulability problem t 1 ? t 2 t 3 CPU 1 CPU 2 t 4 CPU 3 t 5 19/05/2008 Marko Bertogna - Ph. D dissertation 51
n n n Incandescent light bulb: 25 -100 W Compact fluorescent lights: 5 -30 W Typical car: 25 k. W Human climbing stairs: 200 W 1 k. Wh = 1 k. W constantly supplied for 1 h ENEL: 0. 13 -0. 18 €/k. Wh 19/05/2008 Marko Bertogna - Ph. D dissertation 52
Density and utilization bounds 19/05/2008 Marko Bertogna - Ph. D dissertation 53
Uniprocessor feasibility Deadline model Task model Implicit Constrained or Arbitrary Sporadic or Synchronous Periodic Linear test: Utot ≤ 1 Unknown complexity; Pseudo-polynomial test if Utot< 1: EDF until Utot/(1 - Utot) · max(Ti-Di) Asynchronous Periodic Linear test: Utot ≤ 1 19/05/2008 Strong NP-hard; Exponential test: EDF until 2 H+Dmax+rmax Marko Bertogna - Ph. D dissertation 54
Uniprocessor static priority runtime scheduling Deadline model Task model Implicit Constrained Arbitrary Sporadic or Synchronous Periodic RM optimality DM optimality Unknown complexity; Audsley’s bottom-up algorithm (exponential complexity) Asynchronous Periodic 19/05/2008 Unknown complexity; Audsley’s bottom-up algorithm (exponential complexity) Marko Bertogna - Ph. D dissertation 55
Uniprocessor static priority feasibility Deadline model Task model Implicit Constrained Arbitrary Sporadic or Synchronous Periodic Pseudo-polynomial test: RM until Tmax or RTA Pseudo-polynomial test: DM until Dmax or RTA Unknown complexity; Audsley’s bottom-up algorithm (exponential) Asynchronous Periodic 19/05/2008 Unknown complexity Strong NP-hard Audsley’s bottom-up algorithm (exponential) Marko Bertogna - Ph. D dissertation 56
Uniprocessor static priority schedulability Deadline model Task model Sporadic or Synchronous Periodic Asynchronous Periodic 19/05/2008 Implicit Constrained Arbitrary Pseudo-polynomial simulation until Tmax or RTA Pseudo-polynomial simulation until Dmax or RTA Unknown complexity; Lehoczky’s test (exponential) Strong NP-hard; Simulation until 2 H+rmax or other exponential tests Marko Bertogna - Ph. D dissertation 57
Multiprocessor feasibility Deadline model Task model Implicit Asynchronous Periodic 19/05/2008 Arbitrary Unknown complexity; Synchronous periodic not a critical instant Sporadic Synchronous Periodic Constrained Linear test: Utot ≤ m Unknown complexity; Horn’s algorithm in (0, H] Unknown complexity Strong NP-hard Marko Bertogna - Ph. D dissertation 58
Multiprocessor run-time scheduling Deadline model Task model Implicit Sporadic P-fair, GPS Synchronous Periodic Asynchronous Periodic 19/05/2008 P-fair, GPS, LLREF, EKG, BF Constrained Arbitrary Requires clairvoyance Unknown complexity; Clairvoyance not needed; Horn’s algorithm in (0, H] Unknown complexity; Clairvoyance not needed Marko Bertogna - Ph. D dissertation 59
Utot > m load* > m Not feasible Feasibility conditions Sufficient feasibility and schedulability tests Σi Ci /min(Di, Ti) ≤ m 19/05/2008 Marko Bertogna - Ph. D dissertation Feasible ? ? ? 60
Multiprocessor static job priority feasibility Deadline model Task model Implicit Sporadic Unknown complexity Synchronous Periodic Asynchronous Periodic 19/05/2008 Constrained Unknown complexity; Synchronous periodic not a critical instant Unknown complexity; Simulation until hyperperiod for all N! job priority assignments Unknown complexity Arbitrary Unknown complexity Strong NP-hard Marko Bertogna - Ph. D dissertation 61
Multiprocessor static job priority schedulability Deadline model Task model Implicit Sporadic Unknown complexity Synchronous Periodic Asynchronous Periodic 19/05/2008 Constrained Arbitrary Unknown complexity; Synchronous periodic not a critical instant Unknown complexity; Simulation until hyperperiod Unknown complexity Strong NP-hard Marko Bertogna - Ph. D dissertation 62
Multiprocessor static priority runtime scheduling Deadline model Task model Implicit Constrained Arbitrary Periodic (synchronous or asynchronous) Unknown complexity; Cucu’s optimal priority assignment Sporadic Unknown complexity; 19/05/2008 Marko Bertogna - Ph. D dissertation 63
Multiprocessor static priority feasibility Deadline model Task model Implicit Constrained Arbitrary Sporadic Unknown complexity; Synchronous periodic not a critical instant Synchronous Periodic Strong NP-hard; Simulation until hyperperiod for all n! priority assignments Asynchronous Periodic Strong NP-hard; Simulation on exponential feasibility interval for all n! priority assignments 19/05/2008 Marko Bertogna - Ph. D dissertation 64
Multiprocessor static priority schedulability Deadline model Task model Implicit Constrained Arbitrary Sporadic Unknown complexity; Synchronous periodic not a critical instant Synchronous Periodic Unknown complexity; Simulation until hyperperiod Asynchronous Periodic Strong NP-hard; Simulation on exponential feasibility interval 19/05/2008 Marko Bertogna - Ph. D dissertation 65
RTA for Uniprocessors For FP, the worst-case response time of a task is given by the first instance released at a critical instant n For EDF, it is given by an instance in a busy interval starting with a critical instant With these observations it is possible to compute the WCRT of all tasks. Example: for FP, the WCRT of a task k is given by the fixed point of: n 19/05/2008 Marko Bertogna - Ph. D dissertation 66
RTA refinement for EDF n n Still valid the bound: A different bound can be derived analyzing the worst-case workload in a situation in which: n n The interfering and interfered tasks have a common deadline All jobs execute as late as possible Si Ci with: Di Ti Ci Ci Dk and: 19/05/2008 Marko Bertogna - Ph. D dissertation 67
RTA refinement for EDF n A different bound can be derived analyzing the worst-case workload in a situation in which: n n The interfering and interfered tasks have a common deadline All jobs execute as late as possible Si Ci with: Di Ti Ci Ci Dk and: 19/05/2008 Marko Bertogna - Ph. D dissertation 68
Polynomial complexity test n A lower bound on the slack of tk is given by: n For EDF: n For FP: 19/05/2008 Marko Bertogna - Ph. D dissertation 69
Limiting the number of iterations 19/05/2008 Marko Bertogna - Ph. D dissertation 70
- Slides: 70