Instructionbased Systemlevel Power Evaluation of Systemonachip Peripheral Cores
Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores Tony Givargis, Frank Vahid* Dept. of Computer Science & Engineering University of California, Riverside Joerg Henkel NEC C&C Research Princeton, New Jersey *also with the Center for Embedded Computer Systems, UC Irvine This work was supported by the National Science Foundation under grant # CCR 9876006 , and by a Design Automation Conference graduate scholarship.
System-on-a-chip (SOC) n Want to explore alternative cores, parameter settings, and applications n Gate/RT level simulation too slow SOC Microprocessor Application 2 Application 1 Cache Core database Memory Peripheral 1 Bridge Peripheral 2 Peripheral 1 Peripheral 2_a Peripheral 2_b ….
SOC System-level Power Estimation SOC: System-level model Microprocessor Cache Application Memory Bridge Microprocessor Application Cache Memory Peripheral n Marculescu/Pedram 96 n Instruction trace reduction n Simunic/Benini/De. Micheli 99 n Extended instruct. simulator n Givargis/Vahid/Henkel 99 n Trace reductions n Still need system-level method for peripherals Bridge Peripheral n Tiwari/Malik/Wolfe 94 n Instruction set simulator n Plus cache, memory & bus Peripheral Peripheral SOC: Gate-level model n Microprocessor Peripheral n 3 -step method
Core Provider’s Step 1: Instructionbased System-Level Model Creation n System simulation model already commonly used, and required in VSIA standard n Executes ~1000 x faster than gate-level model Reset() UART … Enable_tx() … Enable_rx() … Send() … Rcceive() … Core database UART JPEG decode ….
Core Provider’s Step 2: Low-level Per -instruction Power Evaluation n Measure power of gate/layout model, per instruction n Use unique testbench per instruction, may take hours/days n Low-level model differentiates cores from other SOC modules enabling accurate power estimation n Must account for core parameters Instruction Buffer size UART instruction Reset 2 bytes 4 bytes 8 bytes 16 bytes 13 J UART instruction Reset 13 J 14 J Enable_tx 23 J 25 J 24 J Enable_rx 18 J 19 J Send 76 J 77 J 89 J 115 J Receive 44 J 49 J 55 J 64 J Energy
Core Provider’s Step 3: Back Annotation of System Model UART Energy Reset 13 J Enable_tx 23 J Enable_rx 18 J Send 76 J Receive 44 J Reset() … u. Jtot += 13 Enable_tx() … u. Jtot += 23 Enable_rx() … u. Jtot += 18 Send() … u. Jtot += 76 Rcceive() … u. Jtot += 44 Core database UART JPEG decode ….
Core “Power Modes” Requires Extra Effort by Core Provider 2 bytes 4 bytes 8 bytes 16 bytes Mode 1: Idle Reset 11 J 13 J 14 J Enable_tx 27 J 32 J 31 J Enable_rx 17 J 18 J 19 J 18 J Send 17 J 19 J 20 J Receive 14 J 15 J 17 J 18 J n Unlike microprocessor, certain peripheral core instructions can greatly modify power consumption of other instructions n Must create power mode transition function, and measure power per instruction per mode. Mode 2 : Enabled Reset 13 J 14 J Enable_tx 23 J 25 J 24 J Enable_rx 18 J 19 J Send 76 J 77 J 89 J 115 J Receive 44 J 49 J 55 J 64 J Enable_tx or Enable_rx Mode 1: Idle Mode 2: Enabled Reset
User Performs System Simulation, Which Yields Power Data SOC Microprocessor Application Cache n Simulation takes only seconds or minutes Memory Core database Bridge Peripheral + Total energy UART JPEG decode ….
Results: Image-decode Accelerator n Examined 3 peripheral cores: UART, DMA, JPEG Energy (m. J) n Compared our instruction-based system-level method with: n Gate-level simulation: slow but accurate n “Databook” RT-level: cycle-accurate simulation, used databook averagepower values 2000 1800 1600 1400 1200 1000 800 600 400 200 0 14% 1793 1% Gate-level: 40, 980 sec “Databook” RT-level: 2, 700 sec 1573 1550 Instr. -based system-level: 14 sec 37% 2% 113 155 115 UART 38% 717 5% 519 493 DMA JPEG
Results: Importance of Power Modes n Proper power-mode selection is critical for peripheral cores n Too few modes or wrong modes can lead to much error Gate-level energy (m. J) UART example 113 System-level energy (m. J) Error Singlemode 86 23. 0% Twomodes 104 8. 6% Fourmodes 115 1. 7%
Conclusions n Introduced instruction-based method is n Accurate (less than 5% error) n Fast (1000 x speedup over gate-level) n Fits with current core-based methodology n Concept of power modes is necessary for accuracy n Future work includes: n Trace-simulator-based approach (10 x speedup) n Trace-analysis-based approach (100 x speedup)
- Slides: 11