Predictable Design of Embedded Systems using Networked Architectures
Predictable Design of Embedded Systems using Networked Architectures Henk Corporaal www. ics. ele. tue. nl/~heco ASCI Winterschool on Embedded Systems Rockanje, March 2006 ASCI Winterschool 2006
Outline u Trends and design problems u Unpredictability u Platforms u Predictable design u Proposed design flow u Open issues Note: this lecture is not about a solved problem ASCI Winterschool 2006 Henk Corporaal 2
Outline u Trends and design problems n Embedded systems everywhere n Design practice n Design complexity n Memory wall u Unpredictability u Platforms u Predictable design u Design flow u Open issues ASCI Winterschool 2006 Henk Corporaal 3
Embedded systems everywhere u Convergence of 3 Cs computers, communications and consumer electronics u The computer enters the 3 rd fase computing power - networking - intelligent processing u The world is 1 network wherever, whenever, all information and communication available We get a smart environment ASCI Winterschool 2006 Henk Corporaal 4
Design practice: Informal system specification System Task people Task Paper spec Hardware vhdl people verilog C ASM Software people Integration ASCI Winterschool 2006 Henk Corporaal 5
Design practice Behavioral specification System Algorithm Structure description R/T Logic circuit Y-Chart (Gajski-Kuhn) § Design Flow is path in Y chart Physical realization § Till RT-level largely manual flow ASCI Winterschool 2006 Henk Corporaal 6
Design complexity problem complexity Process technology + 58% 103 HW gap 102 HW design productivity +21 % SW gap 101 SW productivity + 8 % 4 ASCI Winterschool 2006 8 12 16 year Henk Corporaal 7
Hitting the memory wall Performance µProc: 55%/year 1000 CPU 100 10 Processor-Memory Performance Gap: (grows 50% / year) “Moore’s Law” DRAM: 7%/year DRAM 1 1980 1985 1990 1995 2000 2005 Time [Patterson] ASCI Winterschool 2006 Henk Corporaal 8
Outline u Trends and design problems u Unpredictability u Platforms u Predictable design u Proposed design flow u Open issues ASCI Winterschool 2006 Henk Corporaal 9
Unpredictability at all levels applications architectures DSM VLSI design Uncertainty increases at all levels ASCI Winterschool 2006 Henk Corporaal 10
Application: Two forms of unpredictability mem Txt Video In 1 Video In 2 NR NR HSRC gen VSRC mix 100 Hz mem HSRC Peak Matrix VSRC mix mem resources u Applications can be data dependent u Applications may have different scenarios time ASCI Winterschool 2006 Henk Corporaal 11
In addition: dynamic changing set of applications Multi-standard modem operation u Several applications have to be activated simultaneously n Too many combinations for an analysis at design time (non deterministic events) [Philips EVP] SCH = SCH search 100 SCH CPICH search Compute load 125 75 50 25 SCH Initial acquisition ASCI Winterschool 2006 SCH Inter-system handover SCH CPICH search RAKE chip-rate processing SCH RAKE chip-rate processing RAKE sym-rate proc. WLAN acquisition UMTS connected/ WLAN acquisition SCH CPICH search WLAN receiver WLAN connected/ UMTS monitoring time Henk Corporaal 12
Architecture unpredictability ext. mem arb. Local schedulers: u cpu $ OS n n task switching interrupts IP u interconnect n n busses, bridges networks u memory controllers n IP … IP external memory e. g. RR, TDMA, FCFS, LRU, EDLF, FIFO, priority, … IP IP IP interconnect cache pollution IP interconnect u cache strategy $ cpu IP … IP IP What is the global behavior (end-to-end), composed of interacting local solutions ? ASCI Winterschool 2006 Henk Corporaal 13
DSM VLSI Unpredictability u Global wiring delay becomes dominant over gate delay (timing closure) ASCI Winterschool 2006 Henk Corporaal 14
DSM VLSI Unpredictability Length of Isosynchronous zone as function of frequency Other DSM problems: u Clock distribution, skew u VDD and VSS voltage drop u Signal integrity, cross-talk u Variance in process parameters increases ASCI Winterschool 2006 Henk Corporaal 15
Unpredictability: Design Closure problems Design closure = u a realization meets all requirements, including functionality, speed, power, area, yield, etc. , without design iterations application mapping & scheduling architecture placement & routing Closure problem at all levels ASCI Winterschool 2006 FPGA realization VLSI realization Henk Corporaal 16
Computational Requirements → Unpredictability: Design Closure problems 1200% 1000% 800% 600% 400% Orders of Magnitude 200% 0% Time → Mapping with performance guarantees looks impossible !! ASCI Winterschool 2006 Henk Corporaal 17
Solution ingredients: u Higher abstraction levels u SW and HW IP reuse / Pn. P principle n Standards u Avoid large design iterations n Design correct by synthesis u Avoid worst case resource requirements How do we achieve all of this? ASCI Winterschool 2006 Henk Corporaal 18
Outline u Trends and design problems u Unpredictability u Platforms u Predictable design u Design flow u Open issues ASCI Winterschool 2006 Henk Corporaal 19
What is a platform? Definition: A platform is a generic, but domain specific information processing (sub-)system • Generic means that it is flexible, containing programmable component(s). • Platforms are meant to quickly realize your next system (in a certain domain). • Single chip? ASCI Winterschool 2006 Henk Corporaal 20
Platforms, why? - Reuse - Short Time-to-Market - High Quality • • • Flexible and Programmable Large software component Standardization Optimized for specific domain and you do not have to solve this design closure problem !! ASCI Winterschool 2006 Henk Corporaal 21
Platforms separate the design communities ! SDT system design technology PDT platform design technology Design technology Applications Platform Enabling technologies ASCI Winterschool 2006 Henk Corporaal 22
Platform examples: Digital camera Sanyo [Okada 99] ASCI Winterschool 2006 Henk Corporaal 23
TI OMAP Up to 192 Mbyte off-chip memory 192 Kbyte shared SRAM 8 Kb data cache (2 -way, 512 lines of 16 bytes) Write buffer (17 elements) 16 Kb (2 -way) 8 Kb mem (2 x 4 K) 64 Kb dual port (8 x 4 K x 16 b) 96 Kb single port (12 x 4 k x 16 b) 32 Kb ROM ASCI Winterschool 2006 Henk Corporaal 24
Space. Cake (Philips research) u Homogeneous: set of equal tiles u Per tile e. g. : n n * MIPS n m * Tri. Media n Accelerators n k * L 2 Cache bank n Shared memory n Cache coherency n Big interconnect switch L 2 cache memory banks u Inter Tile: n Router n Message passing n Working on inter tile cache coherence ASCI Winterschool 2006 Single tile Henk Corporaal 25
IMAGINE Stream Processor (Stanford) u. IMAGINE = SIMD of VLIWs u. It is controlled by a host processor, which send it stream instructions (Load, store, receive, send, VLIW op, load microcode) ASCI Winterschool 2006 Henk Corporaal 26
Hybrid FPGAs: Xilinx Virtex 4 -Pro GHz IO: Up to 16 serial transceivers Power. PCs Memory blocks & Multipliers Power. PC Re. Config. logic Reconfigurable logic blocks Courtesy of Xilinx (Virtex II Pro) ASCI Winterschool 2006 Henk Corporaal 27
Fundamental platform design decisions u Homogeneous versus Heterogeneous ? u Bus versus Network ? u Shared memory versus Message passing ? u Qo. S support, Guarantees built-in ? u Generic versus Application specific ? u What types of parallelism to support ? n ILP, DLP, TLP u Focus on Performance, Power or Cost ? u Memory organisation ? u HW or SW reconfigurable ? And further: u OS support, Middleware ? u Mapping support? ASCI Winterschool 2006 Henk Corporaal 28
Homogeneous or Heterogeneous u Homogenous: n replication effect n memory dominated any way n solve realization issues once and for all n less flexible ASCI Winterschool 2006 Henk Corporaal 29
Homogeneous or Heterogeneous u Heterogeneous n more flexible n better fit to application domain n smaller increments n no tile reuse ASCI Winterschool 2006 Henk Corporaal 30
Homogeneous or Heterogeneous u Middle of the road approach n n Flexibile tiles Fixed tile structure at top level tile router ASCI Winterschool 2006 Henk Corporaal 31
Reconfiguration time HW or SW reconfigurable? reset FPGA Spatial mapping context ig f n o loopbuffer u c Temporal mapping Subword parallelism 1 cycle fine ASCI Winterschool 2006 n o i t ra d n a b th d i w Data path granularity VLIW coarse Henk Corporaal 32
Outline u Trends and design problems u Unpredictability u Platforms u Predictable design n Current practise n Predictability n Architecture consequences n Design consequences u Design flow u Open issues ASCI Winterschool 2006 Henk Corporaal 33
How should we design ? u Trajectory, from Idea to Realization u Desicions based on models n Abstract from implementation details (not all known yet) n Relatively cheap to create, validate and simulate Idea Concepts Requirements Design Problem • Generate Ideas Design Time • Construct Models “Steers” • Evaluate Properties • Make Design Decisions Realization ASCI Winterschool 2006 Henk Corporaal 34
Current practice Mapping, easy, but. . . u Given n reference C code for application e. g. MPEG-4 Motion Estimation n platform: SUPERDUPER-LX 50 Idea a=b*5+d; for (. . . ) {. . } u Task n map application on architecture u But … wait a moment me@work> CC –o 2 mpeg 4_me. c Thank you for running SUPERDUPER-LX 50 compiler. Your program uses 257321886 bytes memory, 78 Watt, 428798765291 clock cycles ASCI Winterschool 2006 Henk Corporaal 35
Current design process application mapping constraints OK ? yes u Post analysis: check constraints after mapping no u Simulation based u Does it still work for other data ? u Does it still work when other applications are active ? u Too many iterations n Easy to program, hard to tune u Can this be improved ? n e. g. Constraints = input ASCI Winterschool 2006 Henk Corporaal 36
Predictable design What is it? u Being able to reason at a high level about a design (in terms of functional and non-functional properties) and u Being able to realize this design without time consuming iterations in the design flow (design closure) How: u Predictable architecture n Making resources predictable n Proper modeling of less predictable elements u Predictable design flow n Compositionality n Composability n Design time analysis Run time analysis ASCI Winterschool 2006 Henk Corporaal 37
Making architectures predictable u Getting rid of all unpredictable elements u Caches ? n No problem, but WCET estimation may be big and unacceptable ! n Software controlled l locked cache lines non-cachable memory controlled replacement u Shared memory u Communication ASCI Winterschool 2006 Henk Corporaal 38
Making architectures predictable: No. C Philips AETHEREAL Router provides both guaranteed throughput (GT) and best effort (BE) services to communicate with IPs. Router Network Combination of GT and BE leads to efficient use of bandwidth and simple programming model. R IP ASCI Winterschool 2006 Network Interface R R R R Network Interface IP Henk Corporaal 39
Making the No. C predictable: how to support GT traffic? Time wheel concept u control injection traffic at network interface 8 7 2 6 3 5 ASCI Winterschool 2006 time 1 4 Henk Corporaal 40
Making the design flow predictable : Compositionality High level design x a b y z P(x, y) if [P(a, b), . . . ] ! Low level design x a b y z P(x, y) if [P(a, b), . . . ] ? ASCI Winterschool 2006 Henk Corporaal 41
Making the design flow predictable u Design time n Determine of upper bounds on time and resources pareto curves n Scenario discovery: l Freq separate your application in parts for which upper bounds not too far from worst case Sc 1 Sc 2 Sc 3 Load ASCI Winterschool 2006 Henk Corporaal 42
What do we want ? Design time analysis Single application n Reasoning about end-to-end timing constraints (for given resources and quality) = predictability Which local arbitration mechanisms are needed ? How to translate this to the global level ? Example: n n n Given l Comp. Resources l Bandwidth l Buffer size Throughput Pareto curve A 5 A 1 P 1 A 2 P 2 A 3 P 3 A 4 P 4 1/Throughput (q 1, c 1) ASCI Winterschool 2006 Cost (resources) Henk Corporaal 43
Scenarios: MP 3 ASCI Winterschool 2006 Henk Corporaal 44
What do we want ? Composability u Multiple applications n If app. 1 and app. 2 fit each individually, what can be said about the combination ? n Concept of virtual platform A 1 A 2 Proc 1 A 3 ASCI Winterschool 2006 Proc 2 A 4 Henk Corporaal 45
Predictability: Composability Can we add Pareto points? application 1 application 2 Q Q (q 1, c 1) (q 2, c 2) Cost (resources) + (q 1+q 2, c 1+c 2) ? ASCI Winterschool 2006 Henk Corporaal 46
Problem: Predictable Resource utilization? 50 A 50 50 50 B 50 50 Mapping & Scheduling P 1 ASCI Winterschool 2006 P 2 P 3 Henk Corporaal 47
Problem – Predictable Resource utilization? 50 A 50 50 50 B 50 50 Add ordering dependences (edges) P 1 A P 2 B P 3 t 0 t 1 t 2 Only 50% processor utilization ! t 3 Scheduling conflict! ASCI Winterschool 2006 Henk Corporaal 48
Where is the problem? u Different throughput obtained for different order of actors u Possibilities of overall graph increases exponentially with number of actors and individual graphs u Very difficult to do a complete analysis to obtain an optimal order u Hard to model and analyze different arbitration strategies realistically ASCI Winterschool 2006 Henk Corporaal 49
Problem – Too many possibilities! A B C ASCI Winterschool 2006 Henk Corporaal 50
So, what is Composability? u The degree to which we can analyze the applications in isolation: n Throughput, Latency, Resource utilization, Deadlock, Switching / reconfiguration overhead, etc. u Design time analysis for complete system is too expensive and often infeasible u Each job should be executed as if it had access to its own dedicated resources – Virtualization u Consider applications separately and then reason about the behavior of overall system ASCI Winterschool 2006 Henk Corporaal 51
Providing a Bound for Resources u Arbitration strategy plays an important role in determining resource requirement u A naive strategy leads to over-estimation of resources u Worst-case estimate is not always possible u Need predictable arbitration mechanism n More ‘realistic’ worst case bounds n Handle dynamism in the system u An overall quality versus resources Pareto curve needed ASCI Winterschool 2006 Henk Corporaal 52
Making the design flow predictable: Run-time aspects u Scalable applications u Qo. S management Application n / Scenario m Local manager Qo. S protocol Global manager Platform ASCI Winterschool 2006 Henk Corporaal 53
Quality-1 → Match quality with resources Computational Requirements → ASCI Winterschool 2006 Henk Corporaal 54
Outline u Trends and design problems u Unpredictability u Platforms u Predictable design u Design flow u Open issues ASCI Winterschool 2006 Henk Corporaal 55
Design flow Idea C Requirements spec Models Spec Reactive Process Network POOSL/System. C Kahn Process Network (YAPI) BDF SDF correct by synthesis Platform ASCI Winterschool 2006 Henk Corporaal 56
RPN (Reactive Process Networks): events and streaming Event_in • Processing of events • Finite State Machine • Controlling host-CPU (e. g. ARM) • RTOS; hard real-time • ‘classical’ SW complexity mode Stream_in ASCI Winterschool 2006 • Soft Real-time • Compute intensive • Special hardware Event_out status Stream_out Henk Corporaal 57
POOSL Modeling Language u Mathematically defined semantics u Allows formal analysis of model properties u Can formally describe: n concurrency n synchronous communication n timing (delay statements) n functionality P 1 P 2 delay 1; ASCI Winterschool 2006 Henk Corporaal 58
POOSL: Phases of Model Execution State space Synchronous time passage Asynchronous actions execution model time ASCI Winterschool 2006 Henk Corporaal 59
From Model to Realization Possible execution (timed) traces: (S 1, t 1), (S 2, t 1), (S 3, t 1+d 1), (S 5, t 1+d 1) (S 1, t 1), (S 2, t 1), (S 4, t 1+d 2), (S 6, t 1+d 2) a()(); (S 1, t 1), (S 2, t 1+wcet(a)), (S 3, t 1+d 1), (S 5, t 1+d 1+wcet(b)) sel (S 1, t 1), (S 2, t 1+wcet(a)), (S 4, t 1+wcet(a)+wcet(c)), (S 6, t 1+d 2) or ASCI Winterschool 2006 delay d 1; b()(); c()(); delay d 2; les; Henk Corporaal 60
-Hypothesis: property preservation u If the time-deviation between two timed execution traces is ε 1, ε 2 < ε ASCI Winterschool 2006 Henk Corporaal 61
Extending SDF SADF: Scenario Aware Data Flow u Can deal with dynamism u Still possible to reason about n deadlock, n resource utilization, n latency and throughput u Currently implemented in POOSL ASCI Winterschool 2006 Henk Corporaal 62
SADF example: MPEG-2 Decoder u Pipelined MPEG-2 decoder for I and P frames n VLD and IDCT fire per macro-block n MC and RC fire per frame n FD (frame detector) models control part of VLD that determines frame type n Image size = 176 x 144 u I-frame n 99 macro-blocks n No motion vectors u Px-frame n x macro-blocks n Motion vectors from VLD to MC n Previous frame from RC to MC Rate I P 0 Px u P 0 -frame (still video) n Copy previous frame a 0 0 1 b 0 0 x u FD model based on occurrence c 99 1 x d 1 0 1 probability of frame types u Execution time distributions of kernels determined with profiling tool ASCI Winterschool 2006 ex = {30, 40, 9950 , 60, 70, 0 80, 99} x Henk Corporaal 63
Results for MPEG-2 Decoder u. Time unit = 1 k. Cycle Process Throughput VLD 0. 063 rel. error ≤ 0. 036% IDCT 0. 063 rel. error ≤ 0. 036% MC 0. 00106 rel. error ≤ 0. 190% RC 0. 00106 rel. error ≤ 0. 191% Average Latency between Successive Firings Accuracy results based on confidence levels of 0. 95 Process Max. Latency between Successive Firings Variance in Latency between Successive Firings VLD 710 15. 99 rel. error ≤ 0. 031% 75. 38 rel. error ≤ 0. 18% IDCT 698 15. 99 rel. error ≤ 0. 031% 56. 45 rel. error ≤ 4. 99% MC 3305 940. 3 rel. error ≤ 0. 017% 2. 4· 105 rel. error ≤ 3. 46% RC 2216 940. 3 rel. error ≤ 0. 017% 1. 5· 105 rel. error ≤ 4. 99% Channel Memory between Processes Maximum Occupancy VLD and IDCT 9 1. 910 rel. error ≤ 0. 064% 0. 528 rel. error ≤ 1. 99% IDCT and RC 154 60. 19 rel. error ≤ 0. 178% 671. 8 rel. error ≤ 4. 55% VLD and MC 133 34. 73 rel. error ≤ 0. 517% 698. 4 rel. error ≤ 4. 39% MC and RC 1 0. 577 rel. error ≤ 0. 561% 0. 244 rel. error ≤ 3. 27% ASCI Winterschool 2006 Time-Average Occupancy Time-Variance in Occupancy Henk Corporaal 64
Design flow u Run-time n Combine pareto points l n exploit pareto algebra Qo. S management / scalable application ASCI Winterschool 2006 Henk Corporaal 65
Mapping multiple jobs T 0 T 1 T 2 Multiple jobs can be active simultaneously. u When can a second job start ? u Are the requested resources available ? u If not, can the quality level be lowered ? u If not, can other jobs go for a lower quality ? u If yes, independent from other jobs ? u How to give guarantees? resources 100% time reconfiguration ASCI Winterschool 2006 Henk Corporaal 66
Combining Pareto points Cost Application 1 80 Cost 100 Cycle Budget + Cost ASCI Winterschool 2006 Application 2 • A new thread frame coming • 20 cycle budgets available Application 3 Cycle Budget Henk Corporaal 67
Combining Pareto points Cost Application 1 80 Cost Application 2 100 Cycle Budget Cost Application 3 feasible, but optimal? ASCI Winterschool 2006 20 Cycle Budget Henk Corporaal 68
Combining Pareto points Cost Application 1 Application 2 Cost cost increase 1 80 80 100 Cycle Budget Cost Application 3 cost decrease and 2 > 1 ASCI Winterschool 2006 20 40 a better solution Cycle Budget Henk Corporaal 69
Outline u Trends and design problems u Unpredictability u Platforms u Predictable design u Design flow u Open issues ASCI Winterschool 2006 Henk Corporaal 70
Open issues u Gap between specification and architecture modeling u High level modeling n use of modeling pattern library u Incorporate multiple pareto solutions into DSE n Pareto Algebra u Get synthesis correct for n control applications including compute intensive tasks n mapping to multi-processor u Managing Qo. S n Scenario detection, merging, prediction and exploitation n Runtime resource manager optimizing overall quality n Measuring overall quality ASCI Winterschool 2006 Henk Corporaal 71
Open issues (cont'd) u Architecture modeling n how to deal with local memory (scratch pad / cache) u Modeling scheduling and arbitration n make things composable ! u Definition NAL (run-time services) u Automatic partitioning n e. g. , SPRINT tool of IMEC is a good start (C to System. C) u VLSI tiling u …. and many more …. . e. g. see: Ogras e. a. : Key research problems in No. C Design A holistic perspective CODES – ISSS 2005 ASCI Winterschool 2006 Henk Corporaal 72
ASCI Winterschool 2006 Henk Corporaal 73
- Slides: 73