Parameterized SystemsonaChip Frank Vahid Tony Givargis Roman Lysecky

  • Slides: 42
Download presentation
Parameterized Systems-on-a-Chip Frank Vahid Tony Givargis, Roman Lysecky, Leslie Tauro, Susan Cotterell Department of

Parameterized Systems-on-a-Chip Frank Vahid Tony Givargis, Roman Lysecky, Leslie Tauro, Susan Cotterell Department of Computer Science and Engineering University of California, Riverside Supported by: NSF, NEC, DAC scholarship The Dalton Project

Outline • Introduction • Parameterized Systems-on-a-Chip • Exploring Parameter Configurations • Conclusions 2

Outline • Introduction • Parameterized Systems-on-a-Chip • Exploring Parameter Configurations • Conclusions 2

Introduction • Advent of system-on-a-chip Microproc. IC Memory IC Microprocessor core (aka “IP”) IC

Introduction • Advent of system-on-a-chip Microproc. IC Memory IC Microprocessor core (aka “IP”) IC Peripher. IC FPGA IC Peripheral core Board Introduction 3

System-on-a-chip (SOC) Introduction 4

System-on-a-chip (SOC) Introduction 4

The Productivity Gap [ITRS 99] 5

The Productivity Gap [ITRS 99] 5

Programmable Platforms Microprocessor Cache Memory (ITRS 99) DMA Bridge FPGA System bus Peripheral bus

Programmable Platforms Microprocessor Cache Memory (ITRS 99) DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform • Pre-fabricated IC, synthesizable HDL, or both – “reference designs” (VLSI), “silicon platforms” (Philips), “fig chips” (Vahid/Givargis 99) Introduction 6

Targeted to Embedded Systems • May drive future architecture design [Patterson 98] • Varied

Targeted to Embedded Systems • May drive future architecture design [Patterson 98] • Varied power/performance/size constraints – Programmable platforms must adapt Introduction 7

Adapting platforms to constraints • One solution: Architectural Parameters Application 1 Microprocessor main() while

Adapting platforms to constraints • One solution: Architectural Parameters Application 1 Microprocessor main() while (…) { Cache Memory DMA Bridge FPGA System bus Application 2 … main() … while(…) { …… } } Cache Peripheral bus Programmable Peripheral Platform Introduction 8

Related work • Microcontrollers • VLSI’s Velocity • Pleiades project [Rabaey 97] • Microprocessor

Related work • Microcontrollers • VLSI’s Velocity • Pleiades project [Rabaey 97] • Microprocessor + FPGA • Philips’ Y-Chart approach Architecture Applications Mapping Analysis Our focus Introduction Numbers 9

Outline • Introduction • Parameterized Systems-on-a-Chip • Exploring Parameter Configurations • Conclusions 10

Outline • Introduction • Parameterized Systems-on-a-Chip • Exploring Parameter Configurations • Conclusions 10

Basic parameters -- cache Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus

Basic parameters -- cache Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform Parameterized Systems-on-a-chip 11

Basic parameters -- cache Tag • Line Size V T Index D V T

Basic parameters -- cache Tag • Line Size V T Index D V T Offset D • Associativity • Cache Size == == Mux Data Parameterized Systems-on-a-chip 12

Basic parameters -- bus Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus

Basic parameters -- bus Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform Parameterized Systems-on-a-chip 13

Basic parameters -- Bus C 1 Change Bus Width [Givargis 98] Bus Mux Demux

Basic parameters -- Bus C 1 Change Bus Width [Givargis 98] Bus Mux Demux C 2 C 1 > C 2 Parameterized Systems-on-a-chip 14

Basic parameters -- Bus Encoder Decoder Parameterized Systems-on-a-chip 0 1 0 1 1 0

Basic parameters -- Bus Encoder Decoder Parameterized Systems-on-a-chip 0 1 0 1 1 0 0 1 1 Bus-Invert Encoding 1 0 0 1 1 0 Hamming Dist = 3 0 1 0 1 1 Hamming Dist = 6 Binary Encoding invert_ctrl Encode data to reduce switching (Bus Invert) [Stan 95] invert_ctrl 15

Parameter definitions • Parameter – An architectural feature that can be varied, with a

Parameter definitions • Parameter – An architectural feature that can be varied, with a small set of possible values, without changing the application’s essential functionality. • Configuration – A selection of a particular value for every architecture parameter • Static vs. dynamic parameter – Static: Value is set before fabricating the IC. – Dynamic: Value is set after fabricating the IC. Parameterized Systems-on-a-chip 16

Potential tradeoffs experiment Microprocessor I-cache D-cache Memory DMA [ICCAD 99] Bridge System bus Peripheral

Potential tradeoffs experiment Microprocessor I-cache D-cache Memory DMA [ICCAD 99] Bridge System bus Peripheral bus Parameters Possible values Size Peripheral 32 k, 16 k, 8 k, 4 k, 2 k, 1 k, 512, 256, 128 FPGA I-cache Line 8, 16, 32 Associativity 2, 4, 8 Size 32 k, 16 k, 8 k, 4 k, 2 k, 1 k, 512, 256, 128 D-cache Line 8, 16, 32 Associativity 2, 4, 8 Data bus width 4, 8, 16, 32 Mp-c bus Data bus invert on or off Data bus width 4, 8, 16, 32 Sys. bus Data bus invert on or off Parameterized Systems-on-a-chip 17

Potential tradeoffs experiment • Cache: Dinero [Edler, Hill] C Program Instr. Set Micro. Simulator

Potential tradeoffs experiment • Cache: Dinero [Edler, Hill] C Program Instr. Set Micro. Simulator processor [ICCAD 99] • ISS: [Tiwari 96] Cache Simulator Memory Simulator Power Bus simulator Total power Parameterized Systems-on-a-chip 18

Potential tradeoffs experiment • Computed power for all 45, 568 configurations – For each

Potential tradeoffs experiment • Computed power for all 45, 568 configurations – For each of four C applications – Used microprocessor, cache, and bus simulators (1 wk CPU) Tradeoff between performance and power • X-axis: execution time (sec) • Y-axis: power (watt) Parameterized Systems-on-a-chip 19

Potential tradeoffs experiment Bus: 32 -1/32 -0 I: 16 k, 4, 4 D: 16

Potential tradeoffs experiment Bus: 32 -1/32 -0 I: 16 k, 4, 4 D: 16 k, 4, 4. 086 sec, 43. 6 W, 20 k. G Bus: 16 -1/32 -1 I: 16 k, 8, 16 D: 32 k, 8, 8. 389 sec, 11. 4 W, 21 k. G Bus: 8 -1/32 -1 I: 32 k, 8, 8 D: 16 k, 8, 16. 995 sec, 3. 4 W, 30 K Narrower bus required a larger cache size Parameterized Systems-on-a-chip 20

Potential tradeoffs experiment • Performance varied by 11 x • Power varied by 13

Potential tradeoffs experiment • Performance varied by 11 x • Power varied by 13 x • Area varied by 1 x • Energy consumption varied by 2 x Parameterized Systems-on-a-chip 21

Potential tradeoffs experiment Bus: 32 -1/32 -1 I: 1 k, 4, 4 D: 512,

Potential tradeoffs experiment Bus: 32 -1/32 -1 I: 1 k, 4, 4 D: 512, 4, 8 2 ms, . 19 W, 15 k. G Bus: 16 -1/32 -1 I: 1 k, 4, 4 D: 512, 4, 8 3 ms, . 07 W, 17 k. G Bus: 8 -1/4 -0, I: 1 k, 2, 4 D: 512, 2, 4 5 ms, . 02 W, 18 k. G Parameterized Systems-on-a-chip 22

Potential tradeoffs experiment • Performance varied by 2. 5 x • Power varied by

Potential tradeoffs experiment • Performance varied by 2. 5 x • Power varied by 9. 5 x • Area varied by 1 x • Energy consumption varied by 4 x Parameterized Systems-on-a-chip 23

Potential tradeoffs experiment • How much variation in total system power and performance can

Potential tradeoffs experiment • How much variation in total system power and performance can we obtain just by varying the cache and bus parameters? – 9 to 14 x improvement in power/performance • How interdependent are these two types of parameters? – fixing cache param. values, then selecting bus param. values results in non-optimal solutions Parameterized Systems-on-a-chip 24

Many more parameters possible • Some examples include: – – – Code compression Address

Many more parameters possible • Some examples include: – – – Code compression Address bus encoding Multiple levels of memory hierarchy CPU parameters (e. g. , voltage scale, DP width) Peripheral core parameters (our current focus) Fertile research area • Can yield even larger tradeoffs if we: – Create parameter-aware compiler – Adapt OS? Parameterized Systems-on-a-chip 25

Outline • Introduction • Parameterized Systems-on-a-Chip • Exploring Parameter Configurations • Conclusions 26

Outline • Introduction • Parameterized Systems-on-a-Chip • Exploring Parameter Configurations • Conclusions 26

Evaluation by gate-level simulation Reconfigure Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral

Evaluation by gate-level simulation Reconfigure Microprocessor Cache Memory DMA Bridge FPGA System bus Peripheral bus Programmable Peripheral Platform Total power HDL simulation HDL synthesis • Capture each core in HDL, synthesize, simulate • Hours (often tens) per configuration Exploring Parameter Configurations 27

Trace Generator Micro. Instr. Set processor Simulator Cache Simulator • Minutes-per-configuration • Contrast with

Trace Generator Micro. Instr. Set processor Simulator Cache Simulator • Minutes-per-configuration • Contrast with hours-per-conf. Memory Simulator r Exploring Parameter Configurations Bridge Simulator P o we Total power Power Bus Peripheral simulatorbus DMA Simulator Peripheral Simulator OO models C Program Power Reconfigure Evaluation by system-level simulation 28

Evaluation by trace-simulation C Program Instr. trace Address trace Cache trace Simulator • Seconds-per-configuration

Evaluation by trace-simulation C Program Instr. trace Address trace Cache trace Simulator • Seconds-per-configuration Memory trace Simulator Exploring Parameter Configurations Po Total power DMA trace Simulator w e r Bus trace simulator Power Bus trace Power Reconfigure – Get traces from small # of system simulation Bridge trace Simulator Instr. traces Peripheral trace Simulator OO non-fct. models Instr. trace Simulator • Note that the cache simulator is non-functional • Same approach for others Trace Generator 29

Evaluation by trace-analysis C Program Instr. stats. Cache trace analyzer DMA trace analyzer Po

Evaluation by trace-analysis C Program Instr. stats. Cache trace analyzer DMA trace analyzer Po w e r Exploring Parameter Configurations Power Bus trace simulator Total power • Milliseconds-per-configuration Memory trace analyzer Power Bus stats. Power Reconfigure Address stats. – statistically-characterize traces – Still only small # of system simulations Bridge trace analyzer Instr. stats. Peripheral trace analyzer Equations Instr. trace analyzer • Further speedup -- Trace Generator 30

Trace-analysis approach for cache • Given a trace of memory refs • Cache parameters

Trace-analysis approach for cache • Given a trace of memory refs • Cache parameters • Size (S) • Line/block-size (L) • Associativity (A) • Compute # of misses (N) Size (S) Exploring Parameter Configurations 31

Trace-analysis approach for cache Exploring Parameter Configurations 32

Trace-analysis approach for cache Exploring Parameter Configurations 32

Trace-analysis approach for cache • Capture improvements obtainable by: – changing line-size at small/large

Trace-analysis approach for cache • Capture improvements obtainable by: – changing line-size at small/large values of cache-size – changing associativity at small/large values of cache-size Exploring Parameter Configurations 33

Trace-analysis approach for bus capacitance Num transfers per item Random data Exploring Parameter Configurations

Trace-analysis approach for bus capacitance Num transfers per item Random data Exploring Parameter Configurations Bus width Items/second 34

Trace-analysis approach for bus • Bus equation: • m items/second (denotes the traffic N

Trace-analysis approach for bus • Bus equation: • m items/second (denotes the traffic N on the bus) • n bits/item • k bit wide bus • bus-invert encoding • random data assumption Exploring Parameter Configurations 35

Trace-analysis experiments • Cache parameters – size: 128, 256, 512, 1 k, 2 k,

Trace-analysis experiments • Cache parameters – size: 128, 256, 512, 1 k, 2 k, 4 k, 8 k, 16 k, 32 k – assoc: 2, 4, 8 – line: 8, 16, 32 • Bus Parameters – width: 4, 8, 16, 32 CPU Bus A I-Cache D-Cache Memory Bridge – code: binary/bus-invert • Analyzed 45 K sets exhaustively for each of 4 examples. Exploring Parameter Configurations Bus B Peripheral Bus Peripheral 1 Peripheral 2 Peripheral n 36

Experiment Results • Diesel application’s performance • Blue (light-gray) is system-simulation-based • Red (dark-gray)

Experiment Results • Diesel application’s performance • Blue (light-gray) is system-simulation-based • Red (dark-gray) is trace-analysis-based 4% error 320 x faster Exploring Parameter Configurations 37

Experiment Results • Diesel application’s energy consumption • Blue (light-gray) is obtained using full

Experiment Results • Diesel application’s energy consumption • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 2% error 420 x faster Exploring Parameter Configurations 38

Experiment Results • CKey application’s performance • Blue (light-gray) is obtained using full simulation

Experiment Results • CKey application’s performance • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 8% error 125 x faster Exploring Parameter Configurations 39

Experiment Results • CKey application’s energy consumption • Blue (light-gray) is obtained using full

Experiment Results • CKey application’s energy consumption • Blue (light-gray) is obtained using full simulation • Red (dark-gray) is obtained using our equations 3 % error 125 x faster Exploring Parameter Configurations 40

Experiment Results • 125 - 400 x speedup • 1 -18% absolute error (power

Experiment Results • 125 - 400 x speedup • 1 -18% absolute error (power & performance) Time (hours) Power Error (%) • 2% average power error Exploring Parameter Configurations 41

Conclusions • Parameters can improve usefulness of programmable platforms – by adapting platform to

Conclusions • Parameters can improve usefulness of programmable platforms – by adapting platform to particular application and to power/performance constraints • Good tradeoff range even for basic parameters • Fast and accurate evaluation seems possible • Future research – Fast evaluation techniques for general cores – More parameters for SOC’s, static and dynamic – Couple with parameter-aware compilers – Dynamic re-configuration (adapt) 42