Parameterized Embedded Systems Platforms Frank Vahid Students Tony




![The Productivity Gap [ITRS 99] 5 The Productivity Gap [ITRS 99] 5](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-5.jpg)

![Targeted to Embedded Systems • May drive future architecture design [Patterson 98] • Varied Targeted to Embedded Systems • May drive future architecture design [Patterson 98] • Varied](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-7.jpg)

![Related work • Microcontrollers • VLSI’s Velocity • Pleiades project [Rabaey 97] • Microprocessor Related work • Microcontrollers • VLSI’s Velocity • Pleiades project [Rabaey 97] • Microprocessor](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-9.jpg)




![Basic parameters -- Bus C 1 Change Bus Width [Givargis 98] Bus Mux Demux Basic parameters -- Bus C 1 Change Bus Width [Givargis 98] Bus Mux Demux](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-14.jpg)


![Potential tradeoffs experiment Microprocessor I-cache D-cache Memory DMA [ICCAD 99] Bridge System bus Peripheral Potential tradeoffs experiment Microprocessor I-cache D-cache Memory DMA [ICCAD 99] Bridge System bus Peripheral](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-17.jpg)
![Potential tradeoffs experiment • Cache: Dinero [Edler, Hill] C Program Instr. Set Micro. Simulator Potential tradeoffs experiment • Cache: Dinero [Edler, Hill] C Program Instr. Set Micro. Simulator](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-18.jpg)
































- Slides: 50

Parameterized Embedded Systems Platforms Frank Vahid Students: Tony Givargis, Roman Lysecky, Susan Cotterell Dept. of Computer Science and Engineering University of California, Riverside Member, Center for Embedded Computer Systems, UC Irvine Supported by: NSF, NEC The Dalton Project

Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: self-optimizing platforms • Conclusions 2

Introduction • Advent of system-on-a-chip Microproc. IC Memory IC Microprocessor core (aka “IP”) IC Peripher. IC FPGA IC Peripheral core Board Introduction 3

System-on-a-chip (SOC) Introduction 4
![The Productivity Gap ITRS 99 5 The Productivity Gap [ITRS 99] 5](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-5.jpg)
The Productivity Gap [ITRS 99] 5

Programmable Platforms Microprocessor Cache Memory (ITRS 99) DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform • Pre-fabricated IC, synthesizable HDL, or both – “reference designs” (VLSI), “silicon platforms” (Philips), “fig chips” (Vahid/Givargis 99) Introduction 6
![Targeted to Embedded Systems May drive future architecture design Patterson 98 Varied Targeted to Embedded Systems • May drive future architecture design [Patterson 98] • Varied](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-7.jpg)
Targeted to Embedded Systems • May drive future architecture design [Patterson 98] • Varied power/performance/size constraints – Programmable platforms must adapt Introduction 7

Adapting platforms to constraints • One solution: Architectural Parameters Application 1 Microprocessor main() while (…) { Cache Memory DMA Bridge FPGA System bus Application 2 … main() … while(…) { …… } } Cache Peripheral bus Programmable Peripheral Platform Introduction 8
![Related work Microcontrollers VLSIs Velocity Pleiades project Rabaey 97 Microprocessor Related work • Microcontrollers • VLSI’s Velocity • Pleiades project [Rabaey 97] • Microprocessor](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-9.jpg)
Related work • Microcontrollers • VLSI’s Velocity • Pleiades project [Rabaey 97] • Microprocessor + FPGA • Philips’ Y-Chart approach Architecture Applications Mapping Analysis Our focus Introduction Numbers 9

Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: self-optimizing platforms • Conclusions 10

Basic parameters -- cache Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform Parameterized Systems-on-a-chip 11

Basic parameters -- cache Tag • Line Size V T Index D V T Offset D • Associativity • Cache Size == == Mux Data Parameterized Systems-on-a-chip 12

Basic parameters -- bus Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform Parameterized Systems-on-a-chip 13
![Basic parameters Bus C 1 Change Bus Width Givargis 98 Bus Mux Demux Basic parameters -- Bus C 1 Change Bus Width [Givargis 98] Bus Mux Demux](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-14.jpg)
Basic parameters -- Bus C 1 Change Bus Width [Givargis 98] Bus Mux Demux C 2 C 1 > C 2 Parameterized Systems-on-a-chip 14

Basic parameters -- Bus Encoder Decoder Parameterized Systems-on-a-chip 0 1 0 1 1 0 0 1 1 Bus-Invert Encoding 1 0 0 1 1 0 Hamming Dist = 3 0 1 0 1 1 Hamming Dist = 6 Binary Encoding invert_ctrl Encode data to reduce switching (Bus Invert) [Stan 95] invert_ctrl 15

Parameter definitions • Parameter – An architectural feature that can be varied, with a small set of possible values, without changing the application’s essential functionality. • Configuration – A selection of a particular value for every architecture parameter • Static vs. dynamic parameter – Static: Value is set before fabricating the IC. – Dynamic: Value is set after fabricating the IC. Parameterized Systems-on-a-chip 16
![Potential tradeoffs experiment Microprocessor Icache Dcache Memory DMA ICCAD 99 Bridge System bus Peripheral Potential tradeoffs experiment Microprocessor I-cache D-cache Memory DMA [ICCAD 99] Bridge System bus Peripheral](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-17.jpg)
Potential tradeoffs experiment Microprocessor I-cache D-cache Memory DMA [ICCAD 99] Bridge System bus Peripheral bus Parameters Possible values Size Peripheral 32 k, 16 k, 8 k, 4 k, 2 k, 1 k, 512, 256, 128 FPGA I-cache Line 8, 16, 32 Associativity 2, 4, 8 Size 32 k, 16 k, 8 k, 4 k, 2 k, 1 k, 512, 256, 128 D-cache Line 8, 16, 32 Associativity 2, 4, 8 Data bus width 4, 8, 16, 32 Mp-c bus Data bus invert on or off Data bus width 4, 8, 16, 32 Sys. bus Data bus invert on or off Parameterized Systems-on-a-chip 17
![Potential tradeoffs experiment Cache Dinero Edler Hill C Program Instr Set Micro Simulator Potential tradeoffs experiment • Cache: Dinero [Edler, Hill] C Program Instr. Set Micro. Simulator](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-18.jpg)
Potential tradeoffs experiment • Cache: Dinero [Edler, Hill] C Program Instr. Set Micro. Simulator processor [ICCAD 99] • ISS: [Tiwari 96] Cache Simulator Memory Simulator Power Bus simulator Total power Parameterized Systems-on-a-chip 18

Potential tradeoffs experiment • Computed power for all 45, 568 configurations – For each of four C applications – Used microprocessor, cache, and bus simulators (1 wk CPU) Tradeoff between performance and power • X-axis: execution time (sec) • Y-axis: power (watt) Parameterized Systems-on-a-chip 19

Potential tradeoffs experiment Bus: 32 -1/32 -0 I: 16 k, 4, 4 D: 16 k, 4, 4. 086 sec, 43. 6 W, 20 k. G Bus: 16 -1/32 -1 I: 16 k, 8, 16 D: 32 k, 8, 8. 389 sec, 11. 4 W, 21 k. G Bus: 8 -1/32 -1 I: 32 k, 8, 8 D: 16 k, 8, 16. 995 sec, 3. 4 W, 30 K Narrower bus required a larger cache size Parameterized Systems-on-a-chip 20

Potential tradeoffs experiment • Performance varied by 11 x • Power varied by 13 x • Area varied by 1 x • Energy consumption varied by 2 x Parameterized Systems-on-a-chip 21

Potential tradeoffs experiment Bus: 32 -1/32 -1 I: 1 k, 4, 4 D: 512, 4, 8 2 ms, . 19 W, 15 k. G Bus: 16 -1/32 -1 I: 1 k, 4, 4 D: 512, 4, 8 3 ms, . 07 W, 17 k. G Bus: 8 -1/4 -0, I: 1 k, 2, 4 D: 512, 2, 4 5 ms, . 02 W, 18 k. G Parameterized Systems-on-a-chip 22

Potential tradeoffs experiment • Performance varied by 2. 5 x • Power varied by 9. 5 x • Area varied by 1 x • Energy consumption varied by 4 x Parameterized Systems-on-a-chip 23

Potential tradeoffs experiment • How much variation in total system power and performance can we obtain just by varying the cache and bus parameters? – 9 to 14 x improvement in power/performance • How interdependent are these two types of parameters? – fixing cache param. values, then selecting bus param. values results in non-optimal solutions Parameterized Systems-on-a-chip 24

Many more parameters possible • Some examples include: – – – Code compression (Henkel/Wolf) Address bus encoding Multiple levels of memory hierarchy CPU parameters (e. g. , voltage scale, DP width) Peripheral core parameters (our current focus) Fertile research area • Can yield even larger tradeoffs if we: – Create parameter-aware compiler – Adapt OS? Parameterized Systems-on-a-chip 25

Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: self-optimizing platforms • Conclusions 26

Exploring parameter configurations • Low-level simulation – Gate-level simulation • Far too slow, days per configuration – RT-level simulation • Still slow, hours per configuration • Our approach – System-level simulation • Minutes per configuration – System-level trace simulation • Seconds per configuration – System-level trace analysis • Milliseconds per configuration 27

Evaluation by gate-level simulation Reconfigure Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform Total power HDL simulation HDL synthesis • Capture each core in HDL, synthesize, simulate • Hours (often tens) per configuration Exploring Parameter Configurations 28

Trace Generator Micro. Instr. Set processor Simulator Cache Simulator • Minutes-per-configuration • Contrast with hours-per-config. Memory Simulator r Exploring Parameter Configurations Bridge Simulator P o we Total power Power Bus Peripheral simulatorbus DMA Simulator Peripheral Simulator OO models C Program Power Reconfigure Evaluation by system-level simulation 29

Evaluation by trace-simulation C Program Instr. trace Address trace Cache trace Simulator • Seconds-per-configuration Memory trace Simulator Exploring Parameter Configurations Po Total power DMA trace Simulator w e r Bus trace simulator Power Bus trace Power Reconfigure – Get traces from small # of system simulation Bridge trace Simulator Instr. traces Peripheral trace Simulator OO non-fct. models Instr. trace Simulator • Note that the cache simulator is non-functional • Same approach for others Trace Generator 30

System simulation vs. trace simulation System level model DMA Parameter evaluation Execute Power u. P UART Traces Parameter evaluation u. P System level model DMA UART Execute Trace simulators Power 31

Evaluation by trace-analysis C Program Instr. stats. Cache trace analyzer DMA trace analyzer Po w e r Exploring Parameter Configurations Power Bus trace simulator Total power • Milliseconds-per-configuration Memory trace analyzer Power Bus stats. Power Reconfigure Address stats. – statistically-characterize traces – Still only small # of system simulations Bridge trace analyzer Instr. stats. Peripheral trace analyzer Equations Instr. trace analyzer • Further speedup -- Trace Generator 32

Trace-analysis approach for cache • Given a trace of memory refs • Cache parameters • Size (S) • Line/block-size (L) • Associativity (A) • Compute # of misses (N) Size (S) Exploring Parameter Configurations 33

Trace-analysis approach for cache Exploring Parameter Configurations 34

Trace-analysis approach for cache • Capture improvements obtainable by: – changing line-size at small/large values of cache-size – changing associativity at small/large values of cache-size Exploring Parameter Configurations 35

Trace-analysis approach for bus capacitance Num transfers per item Random data Exploring Parameter Configurations Bus width Items/second 36

Trace-analysis approach for bus • Bus equation: • m items/second (denotes the traffic N on the bus) • n bits/item • k bit wide bus • bus-invert encoding • random data assumption Exploring Parameter Configurations 37

Trace-analysis experiments • Cache parameters – size: 128, 256, 512, 1 k, 2 k, 4 k, 8 k, 16 k, 32 k – assoc: 2, 4, 8 – line: 8, 16, 32 • Bus Parameters – width: 4, 8, 16, 32 CPU Bus A I-Cache D-Cache Memory Bridge – code: binary/bus-invert • Analyzed 45 K sets exhaustively for each of 4 examples. Exploring Parameter Configurations Bus B Peripheral Bus Peripheral 1 Peripheral 2 Peripheral n 38

Experiment Results • Diesel application’s performance • Blue (light-gray) is system-simulation-based • Red (dark-gray) is trace-analysis-based 4% error 320 x faster Exploring Parameter Configurations 39

Experiment Results • Diesel application’s energy consumption • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 2% error 420 x faster Exploring Parameter Configurations 40

Experiment Results • CKey application’s performance • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 8% error 125 x faster Exploring Parameter Configurations 41

Experiment Results • CKey application’s energy consumption • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 3 % error 125 x faster Exploring Parameter Configurations 42

Experiment Results • 125 - 400 x speedup • 1 -18% absolute error (power & performance) Time (hours) Power Error (%) • 2% average power error Exploring Parameter Configurations 43

Techniques for general cores • Earlier experiments were for u. P/cache/bus • System simulation for other cores (ISSS’ 00) – – Isolate “instructions” in system-level model Gate-level simulation per instruction Back-annotate system-level model’s instructions Similar to technique for microprocessors, but: • Must consider “power modes” 44

Trace approach for general cores System level model u. P Parameter evaluation Traces DMA Execute Trace simulators Power UART Full trace Reset -Quantize P 1, P 2, …, P 64 IDCT P 1, P 2, …, P 64 Reduced trace with instructions only Reset -Quantize -IDCT -- Reduced trace with characterized data Reset -Quantize. 80 IDCT. 72 Quantize. 93 IDCT. 63 Reduced trace with instruction frequencies Reset *1 Quantize *2 IDCT *2 45

Experiments with general cores: JPEG pixel size (bits) trace file size (Kb) ftrc rtrc_ rtrc cd _i 10 12 32 39 pixel size (bits) 10 12 3. 6 gate m. J CPU time for power evaluation (sec) gate sys ftrc rtrc_ cd i 0. 5 290000 0. 5 330000 average speedup: ftrc m. J 420 443 531 569 average error: error 5% 7% 6% 48 49 6 K 26 27 12 K 4. 9 5. 1 62 K 4. 6 67 K rtrc_cd m. J error rtrc_i m. J error 451 576 491 632 7% 8% 7. 5% 17% 19% 18% 46

Experiments with general cores: UART 47

Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: self-optimizing platforms • Conclusions 48

Future directions • Earlier work – used software on workstation to explore parameter configurations Exploration sw Configuration Workstation Platform • “Self-optimizing” platform – Can we build the exploration ability into the platform itself? – Transparent to the user • Ease of use, more accurate metrics, wider acceptance, Exploration ability Regular binary Workstation Platform – “Embedded CAD” 49

Conclusions • Parameters can improve usefulness of programmable platforms – by adapting platform to particular application and to power/performance constraints • Good tradeoff range even for basic parameters • Fast and accurate evaluation seems possible • Much work remains – More parameters – Better exploration – Self-optimizing platforms 50
Parameterized SystemsonaChip Frank Vahid Tony Givargis Roman Lysecky
SystemonaChip Platform Tuning for Embedded Systems Frank Vahid
SelfImproving Configurable IC Platforms Frank Vahid Associate Professor
SelfImproving Configurable IC Platforms Frank Vahid Associate Professor
Embedded System Contents Embedded System Embedded OS Embedded
Tony Robbins Tony Robbins Bio Tony Robbins makes
Welcome Tony Craddock tony craddockemergingpayments org EPAssoc Tony
Embedded Systems Microcontrollers Embedded Processors An Overview Embedded
02032021 PREACT Embedded Cloud Platforms 02032021 Embedded System
Embedded Systems Embedded systems are computerbased systems which
Warp Processors Frank Vahid Task Leader Department of
Learn by doing Less is more Frank Vahid
Warp Processing Towards FPGA Ubiquity Frank Vahid Professor
Warp Processors Frank Vahid Task Leader Department of
Building Fake Body Parts Digital Mockups Frank Vahid
JIT FPGA Ideas Frank Vahid Dept of CSE
New Opportunities with Platform Based Design Frank Vahid
Zyantes zy Books Animated Interactive Learning Frank Vahid
SelfImproving Computer Chips Warp Processing Frank Vahid Dept
Warp Processors Frank Vahid Task Leader Department of
Building Fake Body Parts Digital Mockups Frank Vahid
Warp Processor A Dynamically Reconfigurable Coprocessor Frank Vahid
Warp Processor A Dynamically Reconfigurable Coprocessor Frank Vahid
OnChip Logic Minimization Roman Lysecky Frank Vahid Department
SelfImproving Computer Chips Warp Processing Frank Vahid Dept
Zyantes zy Books Animated Interactive Learning Frank Vahid
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID
SelfImproving Computer Chips Warp Processing Frank Vahid Dept
Building Fake Body Parts Digital Mockups Frank Vahid
SelfImproving Computer Chips Warp Processing Frank Vahid Dept
Warp Processing Towards FPGA Ubiquity Frank Vahid Professor
JIT FPGA Ideas Frank Vahid Dept of CSE
Building Fake Body Parts Digital Mockups Frank Vahid
Warp Processors Frank Vahid Task Leader Department of
Learn by doing Less is more Frank Vahid
SelfImproving Computer Chips Warp Processing Frank Vahid Dept
Standard Binaries for FPGAs e Blocks Frank Vahid
New Opportunities with Platform Based Design Frank Vahid
Warp Processors Frank Vahid Task Leader Department of
Standard Binaries for FPGAs e Blocks Frank Vahid
Codesigned OnChip Logic Minimization Roman Lysecky Frank Vahid
Who was Anne Frank Anne Frank Anne Frank
A Parameterized Dataflow Language Extension for Embedded Streaming
A PARAMETERIZED DATAFLOW LANGUAGE EXTENSION FOR EMBEDDED STREAMING
Advanced Embedded Systems Lecture 10 Embedded operating systems