Parameterized Embedded Systems Platforms Frank Vahid Students Tony
![Parameterized Embedded Systems Platforms Frank Vahid Students: Tony Givargis, Roman Lysecky, Susan Cotterell Dept. Parameterized Embedded Systems Platforms Frank Vahid Students: Tony Givargis, Roman Lysecky, Susan Cotterell Dept.](https://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-1.jpg)
Parameterized Embedded Systems Platforms Frank Vahid Students: Tony Givargis, Roman Lysecky, Susan Cotterell Dept. of Computer Science and Engineering University of California, Riverside Member, Center for Embedded Computer Systems, UC Irvine Supported by: NSF, NEC The Dalton Project
![Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction:](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-2.jpg)
Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: self-optimizing platforms • Conclusions 2
![Introduction • Advent of system-on-a-chip Microproc. IC Memory IC Microprocessor core (aka “IP”) IC Introduction • Advent of system-on-a-chip Microproc. IC Memory IC Microprocessor core (aka “IP”) IC](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-3.jpg)
Introduction • Advent of system-on-a-chip Microproc. IC Memory IC Microprocessor core (aka “IP”) IC Peripher. IC FPGA IC Peripheral core Board Introduction 3
![System-on-a-chip (SOC) Introduction 4 System-on-a-chip (SOC) Introduction 4](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-4.jpg)
System-on-a-chip (SOC) Introduction 4
![The Productivity Gap [ITRS 99] 5 The Productivity Gap [ITRS 99] 5](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-5.jpg)
The Productivity Gap [ITRS 99] 5
![Programmable Platforms Microprocessor Cache Memory (ITRS 99) DMA Bridge FPGA System bus Peripheral bus Programmable Platforms Microprocessor Cache Memory (ITRS 99) DMA Bridge FPGA System bus Peripheral bus](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-6.jpg)
Programmable Platforms Microprocessor Cache Memory (ITRS 99) DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform • Pre-fabricated IC, synthesizable HDL, or both – “reference designs” (VLSI), “silicon platforms” (Philips), “fig chips” (Vahid/Givargis 99) Introduction 6
![Targeted to Embedded Systems • May drive future architecture design [Patterson 98] • Varied Targeted to Embedded Systems • May drive future architecture design [Patterson 98] • Varied](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-7.jpg)
Targeted to Embedded Systems • May drive future architecture design [Patterson 98] • Varied power/performance/size constraints – Programmable platforms must adapt Introduction 7
![Adapting platforms to constraints • One solution: Architectural Parameters Application 1 Microprocessor main() while Adapting platforms to constraints • One solution: Architectural Parameters Application 1 Microprocessor main() while](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-8.jpg)
Adapting platforms to constraints • One solution: Architectural Parameters Application 1 Microprocessor main() while (…) { Cache Memory DMA Bridge FPGA System bus Application 2 … main() … while(…) { …… } } Cache Peripheral bus Programmable Peripheral Platform Introduction 8
![Related work • Microcontrollers • VLSI’s Velocity • Pleiades project [Rabaey 97] • Microprocessor Related work • Microcontrollers • VLSI’s Velocity • Pleiades project [Rabaey 97] • Microprocessor](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-9.jpg)
Related work • Microcontrollers • VLSI’s Velocity • Pleiades project [Rabaey 97] • Microprocessor + FPGA • Philips’ Y-Chart approach Architecture Applications Mapping Analysis Our focus Introduction Numbers 9
![Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction:](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-10.jpg)
Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: self-optimizing platforms • Conclusions 10
![Basic parameters -- cache Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Basic parameters -- cache Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-11.jpg)
Basic parameters -- cache Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform Parameterized Systems-on-a-chip 11
![Basic parameters -- cache Tag • Line Size V T Index D V T Basic parameters -- cache Tag • Line Size V T Index D V T](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-12.jpg)
Basic parameters -- cache Tag • Line Size V T Index D V T Offset D • Associativity • Cache Size == == Mux Data Parameterized Systems-on-a-chip 12
![Basic parameters -- bus Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Basic parameters -- bus Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-13.jpg)
Basic parameters -- bus Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform Parameterized Systems-on-a-chip 13
![Basic parameters -- Bus C 1 Change Bus Width [Givargis 98] Bus Mux Demux Basic parameters -- Bus C 1 Change Bus Width [Givargis 98] Bus Mux Demux](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-14.jpg)
Basic parameters -- Bus C 1 Change Bus Width [Givargis 98] Bus Mux Demux C 2 C 1 > C 2 Parameterized Systems-on-a-chip 14
![Basic parameters -- Bus Encoder Decoder Parameterized Systems-on-a-chip 0 1 0 1 1 0 Basic parameters -- Bus Encoder Decoder Parameterized Systems-on-a-chip 0 1 0 1 1 0](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-15.jpg)
Basic parameters -- Bus Encoder Decoder Parameterized Systems-on-a-chip 0 1 0 1 1 0 0 1 1 Bus-Invert Encoding 1 0 0 1 1 0 Hamming Dist = 3 0 1 0 1 1 Hamming Dist = 6 Binary Encoding invert_ctrl Encode data to reduce switching (Bus Invert) [Stan 95] invert_ctrl 15
![Parameter definitions • Parameter – An architectural feature that can be varied, with a Parameter definitions • Parameter – An architectural feature that can be varied, with a](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-16.jpg)
Parameter definitions • Parameter – An architectural feature that can be varied, with a small set of possible values, without changing the application’s essential functionality. • Configuration – A selection of a particular value for every architecture parameter • Static vs. dynamic parameter – Static: Value is set before fabricating the IC. – Dynamic: Value is set after fabricating the IC. Parameterized Systems-on-a-chip 16
![Potential tradeoffs experiment Microprocessor I-cache D-cache Memory DMA [ICCAD 99] Bridge System bus Peripheral Potential tradeoffs experiment Microprocessor I-cache D-cache Memory DMA [ICCAD 99] Bridge System bus Peripheral](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-17.jpg)
Potential tradeoffs experiment Microprocessor I-cache D-cache Memory DMA [ICCAD 99] Bridge System bus Peripheral bus Parameters Possible values Size Peripheral 32 k, 16 k, 8 k, 4 k, 2 k, 1 k, 512, 256, 128 FPGA I-cache Line 8, 16, 32 Associativity 2, 4, 8 Size 32 k, 16 k, 8 k, 4 k, 2 k, 1 k, 512, 256, 128 D-cache Line 8, 16, 32 Associativity 2, 4, 8 Data bus width 4, 8, 16, 32 Mp-c bus Data bus invert on or off Data bus width 4, 8, 16, 32 Sys. bus Data bus invert on or off Parameterized Systems-on-a-chip 17
![Potential tradeoffs experiment • Cache: Dinero [Edler, Hill] C Program Instr. Set Micro. Simulator Potential tradeoffs experiment • Cache: Dinero [Edler, Hill] C Program Instr. Set Micro. Simulator](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-18.jpg)
Potential tradeoffs experiment • Cache: Dinero [Edler, Hill] C Program Instr. Set Micro. Simulator processor [ICCAD 99] • ISS: [Tiwari 96] Cache Simulator Memory Simulator Power Bus simulator Total power Parameterized Systems-on-a-chip 18
![Potential tradeoffs experiment • Computed power for all 45, 568 configurations – For each Potential tradeoffs experiment • Computed power for all 45, 568 configurations – For each](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-19.jpg)
Potential tradeoffs experiment • Computed power for all 45, 568 configurations – For each of four C applications – Used microprocessor, cache, and bus simulators (1 wk CPU) Tradeoff between performance and power • X-axis: execution time (sec) • Y-axis: power (watt) Parameterized Systems-on-a-chip 19
![Potential tradeoffs experiment Bus: 32 -1/32 -0 I: 16 k, 4, 4 D: 16 Potential tradeoffs experiment Bus: 32 -1/32 -0 I: 16 k, 4, 4 D: 16](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-20.jpg)
Potential tradeoffs experiment Bus: 32 -1/32 -0 I: 16 k, 4, 4 D: 16 k, 4, 4. 086 sec, 43. 6 W, 20 k. G Bus: 16 -1/32 -1 I: 16 k, 8, 16 D: 32 k, 8, 8. 389 sec, 11. 4 W, 21 k. G Bus: 8 -1/32 -1 I: 32 k, 8, 8 D: 16 k, 8, 16. 995 sec, 3. 4 W, 30 K Narrower bus required a larger cache size Parameterized Systems-on-a-chip 20
![Potential tradeoffs experiment • Performance varied by 11 x • Power varied by 13 Potential tradeoffs experiment • Performance varied by 11 x • Power varied by 13](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-21.jpg)
Potential tradeoffs experiment • Performance varied by 11 x • Power varied by 13 x • Area varied by 1 x • Energy consumption varied by 2 x Parameterized Systems-on-a-chip 21
![Potential tradeoffs experiment Bus: 32 -1/32 -1 I: 1 k, 4, 4 D: 512, Potential tradeoffs experiment Bus: 32 -1/32 -1 I: 1 k, 4, 4 D: 512,](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-22.jpg)
Potential tradeoffs experiment Bus: 32 -1/32 -1 I: 1 k, 4, 4 D: 512, 4, 8 2 ms, . 19 W, 15 k. G Bus: 16 -1/32 -1 I: 1 k, 4, 4 D: 512, 4, 8 3 ms, . 07 W, 17 k. G Bus: 8 -1/4 -0, I: 1 k, 2, 4 D: 512, 2, 4 5 ms, . 02 W, 18 k. G Parameterized Systems-on-a-chip 22
![Potential tradeoffs experiment • Performance varied by 2. 5 x • Power varied by Potential tradeoffs experiment • Performance varied by 2. 5 x • Power varied by](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-23.jpg)
Potential tradeoffs experiment • Performance varied by 2. 5 x • Power varied by 9. 5 x • Area varied by 1 x • Energy consumption varied by 4 x Parameterized Systems-on-a-chip 23
![Potential tradeoffs experiment • How much variation in total system power and performance can Potential tradeoffs experiment • How much variation in total system power and performance can](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-24.jpg)
Potential tradeoffs experiment • How much variation in total system power and performance can we obtain just by varying the cache and bus parameters? – 9 to 14 x improvement in power/performance • How interdependent are these two types of parameters? – fixing cache param. values, then selecting bus param. values results in non-optimal solutions Parameterized Systems-on-a-chip 24
![Many more parameters possible • Some examples include: – – – Code compression (Henkel/Wolf) Many more parameters possible • Some examples include: – – – Code compression (Henkel/Wolf)](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-25.jpg)
Many more parameters possible • Some examples include: – – – Code compression (Henkel/Wolf) Address bus encoding Multiple levels of memory hierarchy CPU parameters (e. g. , voltage scale, DP width) Peripheral core parameters (our current focus) Fertile research area • Can yield even larger tradeoffs if we: – Create parameter-aware compiler – Adapt OS? Parameterized Systems-on-a-chip 25
![Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction:](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-26.jpg)
Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: self-optimizing platforms • Conclusions 26
![Exploring parameter configurations • Low-level simulation – Gate-level simulation • Far too slow, days Exploring parameter configurations • Low-level simulation – Gate-level simulation • Far too slow, days](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-27.jpg)
Exploring parameter configurations • Low-level simulation – Gate-level simulation • Far too slow, days per configuration – RT-level simulation • Still slow, hours per configuration • Our approach – System-level simulation • Minutes per configuration – System-level trace simulation • Seconds per configuration – System-level trace analysis • Milliseconds per configuration 27
![Evaluation by gate-level simulation Reconfigure Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral Evaluation by gate-level simulation Reconfigure Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-28.jpg)
Evaluation by gate-level simulation Reconfigure Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform Total power HDL simulation HDL synthesis • Capture each core in HDL, synthesize, simulate • Hours (often tens) per configuration Exploring Parameter Configurations 28
![Trace Generator Micro. Instr. Set processor Simulator Cache Simulator • Minutes-per-configuration • Contrast with Trace Generator Micro. Instr. Set processor Simulator Cache Simulator • Minutes-per-configuration • Contrast with](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-29.jpg)
Trace Generator Micro. Instr. Set processor Simulator Cache Simulator • Minutes-per-configuration • Contrast with hours-per-config. Memory Simulator r Exploring Parameter Configurations Bridge Simulator P o we Total power Power Bus Peripheral simulatorbus DMA Simulator Peripheral Simulator OO models C Program Power Reconfigure Evaluation by system-level simulation 29
![Evaluation by trace-simulation C Program Instr. trace Address trace Cache trace Simulator • Seconds-per-configuration Evaluation by trace-simulation C Program Instr. trace Address trace Cache trace Simulator • Seconds-per-configuration](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-30.jpg)
Evaluation by trace-simulation C Program Instr. trace Address trace Cache trace Simulator • Seconds-per-configuration Memory trace Simulator Exploring Parameter Configurations Po Total power DMA trace Simulator w e r Bus trace simulator Power Bus trace Power Reconfigure – Get traces from small # of system simulation Bridge trace Simulator Instr. traces Peripheral trace Simulator OO non-fct. models Instr. trace Simulator • Note that the cache simulator is non-functional • Same approach for others Trace Generator 30
![System simulation vs. trace simulation System level model DMA Parameter evaluation Execute Power u. System simulation vs. trace simulation System level model DMA Parameter evaluation Execute Power u.](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-31.jpg)
System simulation vs. trace simulation System level model DMA Parameter evaluation Execute Power u. P UART Traces Parameter evaluation u. P System level model DMA UART Execute Trace simulators Power 31
![Evaluation by trace-analysis C Program Instr. stats. Cache trace analyzer DMA trace analyzer Po Evaluation by trace-analysis C Program Instr. stats. Cache trace analyzer DMA trace analyzer Po](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-32.jpg)
Evaluation by trace-analysis C Program Instr. stats. Cache trace analyzer DMA trace analyzer Po w e r Exploring Parameter Configurations Power Bus trace simulator Total power • Milliseconds-per-configuration Memory trace analyzer Power Bus stats. Power Reconfigure Address stats. – statistically-characterize traces – Still only small # of system simulations Bridge trace analyzer Instr. stats. Peripheral trace analyzer Equations Instr. trace analyzer • Further speedup -- Trace Generator 32
![Trace-analysis approach for cache • Given a trace of memory refs • Cache parameters Trace-analysis approach for cache • Given a trace of memory refs • Cache parameters](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-33.jpg)
Trace-analysis approach for cache • Given a trace of memory refs • Cache parameters • Size (S) • Line/block-size (L) • Associativity (A) • Compute # of misses (N) Size (S) Exploring Parameter Configurations 33
![Trace-analysis approach for cache Exploring Parameter Configurations 34 Trace-analysis approach for cache Exploring Parameter Configurations 34](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-34.jpg)
Trace-analysis approach for cache Exploring Parameter Configurations 34
![Trace-analysis approach for cache • Capture improvements obtainable by: – changing line-size at small/large Trace-analysis approach for cache • Capture improvements obtainable by: – changing line-size at small/large](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-35.jpg)
Trace-analysis approach for cache • Capture improvements obtainable by: – changing line-size at small/large values of cache-size – changing associativity at small/large values of cache-size Exploring Parameter Configurations 35
![Trace-analysis approach for bus capacitance Num transfers per item Random data Exploring Parameter Configurations Trace-analysis approach for bus capacitance Num transfers per item Random data Exploring Parameter Configurations](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-36.jpg)
Trace-analysis approach for bus capacitance Num transfers per item Random data Exploring Parameter Configurations Bus width Items/second 36
![Trace-analysis approach for bus • Bus equation: • m items/second (denotes the traffic N Trace-analysis approach for bus • Bus equation: • m items/second (denotes the traffic N](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-37.jpg)
Trace-analysis approach for bus • Bus equation: • m items/second (denotes the traffic N on the bus) • n bits/item • k bit wide bus • bus-invert encoding • random data assumption Exploring Parameter Configurations 37
![Trace-analysis experiments • Cache parameters – size: 128, 256, 512, 1 k, 2 k, Trace-analysis experiments • Cache parameters – size: 128, 256, 512, 1 k, 2 k,](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-38.jpg)
Trace-analysis experiments • Cache parameters – size: 128, 256, 512, 1 k, 2 k, 4 k, 8 k, 16 k, 32 k – assoc: 2, 4, 8 – line: 8, 16, 32 • Bus Parameters – width: 4, 8, 16, 32 CPU Bus A I-Cache D-Cache Memory Bridge – code: binary/bus-invert • Analyzed 45 K sets exhaustively for each of 4 examples. Exploring Parameter Configurations Bus B Peripheral Bus Peripheral 1 Peripheral 2 Peripheral n 38
![Experiment Results • Diesel application’s performance • Blue (light-gray) is system-simulation-based • Red (dark-gray) Experiment Results • Diesel application’s performance • Blue (light-gray) is system-simulation-based • Red (dark-gray)](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-39.jpg)
Experiment Results • Diesel application’s performance • Blue (light-gray) is system-simulation-based • Red (dark-gray) is trace-analysis-based 4% error 320 x faster Exploring Parameter Configurations 39
![Experiment Results • Diesel application’s energy consumption • Blue (light-gray) is obtained using full Experiment Results • Diesel application’s energy consumption • Blue (light-gray) is obtained using full](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-40.jpg)
Experiment Results • Diesel application’s energy consumption • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 2% error 420 x faster Exploring Parameter Configurations 40
![Experiment Results • CKey application’s performance • Blue (light-gray) is obtained using full simulation Experiment Results • CKey application’s performance • Blue (light-gray) is obtained using full simulation](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-41.jpg)
Experiment Results • CKey application’s performance • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 8% error 125 x faster Exploring Parameter Configurations 41
![Experiment Results • CKey application’s energy consumption • Blue (light-gray) is obtained using full Experiment Results • CKey application’s energy consumption • Blue (light-gray) is obtained using full](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-42.jpg)
Experiment Results • CKey application’s energy consumption • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 3 % error 125 x faster Exploring Parameter Configurations 42
![Experiment Results • 125 - 400 x speedup • 1 -18% absolute error (power Experiment Results • 125 - 400 x speedup • 1 -18% absolute error (power](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-43.jpg)
Experiment Results • 125 - 400 x speedup • 1 -18% absolute error (power & performance) Time (hours) Power Error (%) • 2% average power error Exploring Parameter Configurations 43
![Techniques for general cores • Earlier experiments were for u. P/cache/bus • System simulation Techniques for general cores • Earlier experiments were for u. P/cache/bus • System simulation](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-44.jpg)
Techniques for general cores • Earlier experiments were for u. P/cache/bus • System simulation for other cores (ISSS’ 00) – – Isolate “instructions” in system-level model Gate-level simulation per instruction Back-annotate system-level model’s instructions Similar to technique for microprocessors, but: • Must consider “power modes” 44
![Trace approach for general cores System level model u. P Parameter evaluation Traces DMA Trace approach for general cores System level model u. P Parameter evaluation Traces DMA](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-45.jpg)
Trace approach for general cores System level model u. P Parameter evaluation Traces DMA Execute Trace simulators Power UART Full trace Reset -Quantize P 1, P 2, …, P 64 IDCT P 1, P 2, …, P 64 Reduced trace with instructions only Reset -Quantize -IDCT -- Reduced trace with characterized data Reset -Quantize. 80 IDCT. 72 Quantize. 93 IDCT. 63 Reduced trace with instruction frequencies Reset *1 Quantize *2 IDCT *2 45
![Experiments with general cores: JPEG pixel size (bits) trace file size (Kb) ftrc rtrc_ Experiments with general cores: JPEG pixel size (bits) trace file size (Kb) ftrc rtrc_](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-46.jpg)
Experiments with general cores: JPEG pixel size (bits) trace file size (Kb) ftrc rtrc_ rtrc cd _i 10 12 32 39 pixel size (bits) 10 12 3. 6 gate m. J CPU time for power evaluation (sec) gate sys ftrc rtrc_ cd i 0. 5 290000 0. 5 330000 average speedup: ftrc m. J 420 443 531 569 average error: error 5% 7% 6% 48 49 6 K 26 27 12 K 4. 9 5. 1 62 K 4. 6 67 K rtrc_cd m. J error rtrc_i m. J error 451 576 491 632 7% 8% 7. 5% 17% 19% 18% 46
![Experiments with general cores: UART 47 Experiments with general cores: UART 47](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-47.jpg)
Experiments with general cores: UART 47
![Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction:](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-48.jpg)
Outline • Introduction • Parameterized SOC platforms • Exploring parameter configurations • Future direction: self-optimizing platforms • Conclusions 48
![Future directions • Earlier work – used software on workstation to explore parameter configurations Future directions • Earlier work – used software on workstation to explore parameter configurations](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-49.jpg)
Future directions • Earlier work – used software on workstation to explore parameter configurations Exploration sw Configuration Workstation Platform • “Self-optimizing” platform – Can we build the exploration ability into the platform itself? – Transparent to the user • Ease of use, more accurate metrics, wider acceptance, Exploration ability Regular binary Workstation Platform – “Embedded CAD” 49
![Conclusions • Parameters can improve usefulness of programmable platforms – by adapting platform to Conclusions • Parameters can improve usefulness of programmable platforms – by adapting platform to](http://slidetodoc.com/presentation_image_h2/bf7b51680b92896e202b29598c1dfff5/image-50.jpg)
Conclusions • Parameters can improve usefulness of programmable platforms – by adapting platform to particular application and to power/performance constraints • Good tradeoff range even for basic parameters • Fast and accurate evaluation seems possible • Much work remains – More parameters – Better exploration – Self-optimizing platforms 50
- Slides: 50