Alan Burns Hard RealTime Systems and Many Core
Alan Burns Hard Real-Time Systems and Many. Core Platforms Alan Burns Real-Time Systems Group Department of Computer Science University of York United Kingdom Real-Time Systems Group
Alan Burns Motivation § Temporal predictability is getting increasing difficult with multi- and manycore platforms § Advice to ‘use only one core’ in avionics white papers § How to bound interference across the hardware platform? Real-Time Systems Group 2
Alan Burns Temporal Requirements § General task/thread model § Mixed criticality sporadic tasks 4 Released by events or the passage of time 4 Given any interval of time t the maximum number of releases of each task within t is known § Tasks synchronise and communicate as required 4 i. e. not constrained § It follows that a time-triggered static schedule is not the solution Real-Time Systems Group 3
Alan Burns Outline § Allocation § Networks-on-Chip 4 Wormhole networks § Real-Time Analysis 4 A tale of mistakes, corrections, errors and plagiarism § Predictability as an emergent property Real-Time Systems Group 4
Alan Burns Allocation § Most theoretical results assume homogeneous architectures § Some work on 4 Two processor types, e. g. big and small 4 Variable speed processors § FPGAs, GPUs § More general heterogeneous models beginning to appear in literature Real-Time Systems Group 5
Alan Burns Dynamic Allocation/Partitioning § Dhall effect, from 1978 § For single processor systems EDF or DMPA for fixed priority scheduling are optimal § BUT 450 tasks needing 2 ms every 100 41 task requires 100 in 101 4 EDF and DMPA leads to unschedulable on 50 cores 4 Clearly schedulable on 2 cores Real-Time Systems Group 6
Alan Burns Dynamic Allocation/Partitioning § Later theoretical results indicate benefits from global (dynamic) task scheduling § But 4 These results ignore overheads (migration costs) 4 They make modelling of inter-task communication more problematic § Hence Partitioned task allocation is main focus of more practical research § Semi-partitioned gets most of the benefits 4 C=D scheme, moves just one task per core Real-Time Systems Group 7
Alan Burns Communication § Bus-based architectures are problematic 4 In effect unbounded interference § Need a resource that can be scheduled (managed) § Hence Networks Real-Time Systems Group 8
Alan Burns Networks – key characteristics § topology 4 mesh, star, torus… § routing protocol 4 deterministic, adaptive… § buffering 4 FIFO, SAFC, SAMQ, DAMQ, hot potato… § flow control protocol 4 handshake, credit-based… § arbitration 4 round-robin, priority preemptive, priority nonpreemptive, TDM… Real-Time Systems Group § switching protocol 4 store-and-forward, circuit, wormhole 9
Alan Burns Networks – key characteristics § topology 4 mesh, star, torus… § routing protocol 4 deterministic, adaptive… § buffering 4 FIFO, SAFC, SAMQ, DAMQ, hot potato… § flow control protocol 4 handshake, credit-based… § arbitration 4 round-robin, priority preemptive, priority nonpreemptive, TDM… Real-Time Systems Group § switching protocol 4 store-and-forward, circuit, wormhole 10
Alan Burns Wormhole switching § Packet is routed and forwarded as soon the header flit has arrived 4 payload flits follow header § Input ports does not need to buffer a complete packet 4 flits of a packet can be stored across multiple routers § Trade-off between buffer overheads and network contention Real-Time Systems Group 11
Alan Burns Wormhole Switching Switch Switch Packet Header Terminal Real-Time Systems Group Packet Data 12
Alan Burns Wormhole Switching Switch Switch Packet Header Terminal Real-Time Systems Group Packet Data 13
Alan Burns Wormhole Switching Switch Switch Packet Header Terminal Real-Time Systems Group Packet Data 14
Alan Burns Wormhole Switching Switch Switch Packet Header Terminal Real-Time Systems Group Packet Data 15
Alan Burns Wormhole Switching Switch Switch Packet Header Terminal Real-Time Systems Group Packet Data 16
Alan Burns Wormhole Switching Switch Switch Packet Header Terminal Real-Time Systems Group Packet Data 17
Alan Burns Wormhole Switching Switch Switch Packet Header Terminal Real-Time Systems Group Packet Data 18
Alan Burns Wormhole Networks-on-Chip § Small buffering overheads of wormhole networks is particularly attractive to a special class of resourceconstrained networks: Networks-on-Chip (No. Cs) 4 small buffers mean smaller area and lower energy dissipation PE PE R R PE PE R Real-Time Systems Group R R PE PE R R 19
Alan Burns Wormhole Networks-on-Chip PE Router Core PE R R PE PE R Real-Time Systems Group R PE PE R Link PE R R 20
Alan Burns Wormhole Networks-on-Chip PE PE R Real-Time Systems Group R PE PE R Link PE R R 21
Alan Burns Wormhole Networks-on-Chip arbitration PE data in routing & transmission control data out PE data out data in data out R R PE PE R Real-Time Systems Group PE R R data out data in PE R R 22
Alan Burns Wormhole Networks-on-Chip Real-Time Systems Group 23
Alan Burns No. C parallelism and scalability Multiple connections simultaneously Real-Time Systems Group 24
Alan Burns No. C performance task contention leads to latency variability Real-Time Systems Group link contention leads to latency variability 25
Alan Burns Task Scheduling § Priority Based scheduling 4 Each task has a priority derived from its temporal characteristics 4 Required WCET for each task 4 Response-Time analysis enables the worst-case completion time (R) to be calculated 4 This can then be compared with task deadline to ensure R <= D Real-Time Systems Group 26
Alan Burns Priority preemptive virtual channels § Wormhole No. Cs using virtual channels with priority preemptive arbitration can discriminate packets of different levels of urgency Real-Time Systems Group 27
Alan Burns Priority preemptive virtual channels PE PE PE highest priority with remaining credit priority ID R R PE PE PE R R PE data_in R … credit_out PE PE R R R data_out routing & transmission control credit_in R … Real-Time Systems Group 28
Alan Burns No. C Performance Evaluation § Many approaches to evaluate No. C performance 4 full system prototyping • cores + No. C in FPGA, running OS + application • extremely costly setup time, can only explore few design alternatives 4 accurate system simulation • cycle-accurate model of cores + No. C, running OS + application • extremely long simulation time, can only explore few design alternatives 4 approximately-timed system simulation • approximately-timed model of cores + No. C, executing an abstract model of the OS + application 4 analytical system performance models • worst-case latency estimation for restricted application characteristics Real-Time Systems Group 29
Alan Burns Real-Time Analysis § A packet flow ti is a potentially unbounded sequence of packets, characterised by 4 priority Pi 4 period Ti - smallest inter-arrival interval 4 no-load latency Ci – deterministic, function of # flits, # hops, router latency and link latency 4 release jitter JRi 4 deadline Di § Analysis aims to calculate worst case response time Ri for every ti, and check if Ri ≤ Di Real-Time Systems Group 30
Alan Burns Real-Time Analysis § First approaches to analyse priority-preemptive wormhole networks came during the 90 s 4 Mutka (Workshop Parallel Distr Real-Time Sys 1994) 4 Hary and Ozguner (IEE Proc CDT 1997) § Key idea is to consider the entire path of a packet as a single shared resource 4 worst-case latency bound of a packet flow can be found by analysing the higher priority packet flows that share at least one link of its route 4 assumes that interference from higher priority flows will be bounded by the number of “hits”, and each hit bounded by the no-load latency of the interfering flow Real-Time Systems Group 31
Alan Burns Real-Time Analysis t 4 t 2 interference graph t 1 PE PE PE t 2 t 4 PE R R PE PE R t 3 PE R R t 3 t 1 R R P 1 > P 2 > P 3 > P 4 Real-Time Systems Group 32
Alan Burns Real-Time Analysis § Kim et al (Conf Parallel Proc 1998) recognised that upstream indirect interference can also affect latency upper bounds t 2 t 1 PE PE R R t 3 PE Real-Time Systems Group PE R R PE PE R § Shi&Burns (NOCS 2008) produced a response time formulation that conservatively models upstream indirect interference PE R R P 1 > P 2 > P 3 33
Alan Burns Real-Time Analysis § Several lines of work were derived from Shi and Burns 2008 4 well cited: 158 (Google Scholar) 4 many works on priority assignment and task mapping 4 a few on analysis improvement, aiming to make it tighter • Nikolic et al (arxiv 2016) considered that the interference should not be calculated based on the full path, but the contention domain • Kashif et al (IEEE Trans Comp 2015) attempted to analyse packet paths on a link-by-link manner, but assumed infinite buffering (i. e. did not consider backpressure) • Kashif and Patel (RTAS 2016) attempted to consider buffering and backpressure effects • all of them upper-bounded by Shi and Burns 2008 Real-Time Systems Group 34
Alan Burns Real-Time Analysis Xiong et al (GLSVLSI 2016) made two key contributions 4 new formulation to the downstream indirect interference problem, aiming to capture a previously unseen issue: multi-point blocking • showed that Shi and Burns 2008 is optimistic and unsafe (and so are all the analyses upper-bounded by it) 4 new formulation to the upstream indirect interference problem, aiming to be tighter than Shi and Burns 2008 • was shown to be flawed by Indrusiak et al (arxiv 2016) Real-Time Systems Group 35
Alan Burns Real-Time Analysis § Xiong et al published a corrected analysis in IEEE Trans Comp in 2017 4 including the fix from Indrusiak at al (arxiv 2016) for the upstream indirect interference 4 accounting downstream indirect interference as if it is direct interference, making worst-case response times safe even under multi-point blocking scenarios Real-Time Systems Group 36
Alan Burns Newer Analysis – DATE 2018 (Best Paper) § Aims to reduce the pessimism when accounting for downstream indirect interference § Key ideas 4 buffered interference: multi-point blocking is only caused by flits buffered over the routers shared between the flow under analysis and the flow causing interference 4 buffered interference will reoccur for every downstream indirect interference hit suffered by the flow causing interference Real-Time Systems Group 37
Alan Burns Proposed Analysis § Key ideas 4 buffered interference: multi-point blocking is only caused by flits buffered over the routers shared between the flow under analysis and the flow causing interference 4 upper-bounding buffer interference biij caused by tj on ti: buffer spaces per router Real-Time Systems Group link latency number of hops shared between ti and tj 38
Alan Burns Proposed Analysis § Key ideas 4 buffered interference will reoccur for every downstream indirect interference hit suffered by the flow causing interference 4 upper-bounding number of downstream indirect interference hits for every flow that can cause downstream indirect interference on ti via tj Real-Time Systems Group maximum number of hits that tj can suffer, causing buffered interference to be replenished 39
Alan Burns Proposed Analysis § Caveat 4 there are cases when the amount of downstream indirect interference is not enough to fill up the buffers up to the maximum possible amount of buffered interference 4 in that case, revert to Xiong et al 2017 and consider downstream indirect interference instead Real-Time Systems Group 40
Alan Burns Evaluation § Large-scale evaluation using synthetic scenarios 44 x 4 and 8 x 8 No. Cs 4 each experiment has 100 randomly generated scenarios with N packet flows • packet flows have periods between 0. 5 s and 0. 5 ms, packet length between 128 and 4096 flits, rate-monotonic priority and random source and destination PEs 4 multiple experiments were performed by increasing N and checking how many of the 100 scenarios are fully schedulable, i. e. all its flows have R ≤ D • for each analysis: Xiong et al (XLWX), Shi&Burns (SB), proposed (IBN) Real-Time Systems Group 41
Alan Burns Evaluation – 4 x 4 100 90 70 60 50 40 30 20 100 90 80 70 60 50 40 30 20 10 40 50 60 70 80 90 100 0 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 % schedulable flowsets % schedulable flow sets 80 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 # flows per flow set SB Real-Time Systems Group XLWX IBN 2 IBN 100 42
Alan Burns Evaluation – 8 x 8 100 90 70 60 50 40 30 20 10 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 0 100 90 80 70 60 50 40 30 20 10 0 80 90 100110120130140150160170180190200210220230240250260270280290300310320330340350360370380390400410420430440450460470480490500510520 % schedulable flowsets % schedulable flow sets 80 # flows per flow set SB Real-Time Systems Group XLWX IBN 2 IBN 100 43
Alan Burns How can we improve this situation? § Mixed Criticality view § Change the wormhole protocol 4 Force the world to behave according to the analysis § Can temporal predictability become an emergent property of platform? Real-Time Systems Group 44
Alan Burns Randomise Behaviour § Allow instruction times to be random § But not necessarily iid, so weaklyindependent (e. g. modelled as a Markov chain) § Assumes timing requirements are at the task level (e. g. milliseconds) § Assume task contains a large number of instructions § Can we derive WCET as a simple function of average-case execution time? Real-Time Systems Group 45
Alan Burns Example with probability of failure 10^-9 1. 06 Bayesian (I) Bayesian (P) Simple (I) 1. 04 Simple (P) K 1. 02 1. 00 0. 98 0. 96 1000 Real-Time Systems Group 10000000 46
Alan Burns Real Hardware § Can a real platform behave randomly? § Yes if overall behaviour is the sum of many minor effects § In this sense COTS and complex platforms are better! § Consider a 4 -core R Pi: contains a number of unmodelled non-deterministic features Real-Time Systems Group 47
Alan Burns Example Real-Time Systems Group 48
Alan Burns Conclusions § Modeling the temporal behaviour of modern manycore platforms is hard § Two routes forward: 4 Provide managed resource at all levels 4 Provide unmanaged resources at all levels § For the former Wormhole routing is an important technology, although changes to the protocol could improve schedulability Real-Time Systems Group 49
Alan Burns Conclusions § Modeling the temporal behaviour of modern manycore platforms is hard § Two routes forward: 4 Provide managed resource at all levels 4 Provide unmanaged resources at all levels § For the former Wormhole routing is an important technology, although changes to the protocol could improve schedulability § For the latter a change in mind-set is required! Real-Time Systems Group 50
- Slides: 50