Processor Microarchitecture Fetch Decode ExecuteWriteback Register Files ALU
Processor Microarchitecture Fetch Decode Execute/Writeback Register Files ALU MUL Instruction Cache Fetch Queue Instruction Decoder Instruction Queue FPU LD Branch Prediction Instruction TLB ST Data TLB L 1 Data Cache Network Memory L 2 Data Cache No. C Router On-Chip Network
Energy/Power Calculation • How do we calculate energy or power dissipation at given microarchitecture? • Energy/Power varies between: – Different ISA; ARM vs Intel x 86 – Different microarchitecture; in-order vs out-of-order – Different applications; memory vs compute-bound – Different technologies; 90 nm vs 22 nm technology – Different operation conditions; frequency, temperature
Architecture Activity (1) icache. read++; fbuffer. write++; Activity 1: Instruction Fetch Register Files ALU MUL Instruction Cache Fetch Queue Instruction Decoder Instruction TLB • Activity counts at each component differs between applications. FPU LD Branch Prediction • Collect activity counts of each architecture component (through simulation or measurement). • List of components differs between microarchitectures. Instruction Queue ST Data TLB L 1 Data Cache L 2 Data Cache No. C Router On-Chip Network
Architecture Activity (2) fbuffer. read++; idecoder. logic++; Activity 2: Instruction Decode Register Files ALU MUL Instruction Cache Instruction TLB Fetch Queue Instruction Decoder Instruction Queue FPU LD Branch Prediction • Read/write accesses to caches, buffers, etc. • Logical accesses to logic blocks such as decoder, ALUs, etc. • Tradeoff of differentiating more access types (accuracy) vs simulation speed (complexity). ST Data TLB L 1 Data Cache L 2 Data Cache No. C Router On-Chip Network
Power and Architecture Activity • For example, At nth clock cycle, collected counters are: – Data cache: • • • read = 20, write = 12; per-read energy = 0. 5 n. J; per-write energy = 0. 6 n. J; Read energy = read*per-read energy = 10 n. J Write energy = write*per-write energy = 7. 2 n. J Total activity energy = read+write energies = 17. 2 n. J If n = 50 th clock cycle and clock frequency = 2 GHz, Total activity power = energy*clock_freq/n = 688 m. W *Note: n/clock_freq = n clock periods in sec power = time average of energy
Things to consider (1) 1. How do we calculate per-read/write energies? • Per-access energies can be estimated from circuit-level designs and analyses. • There are various open-source tools for this. Architecture Specification Technology Parameters Circuit-level Estimation Tool Estimation Results: Area, Energy, Timing, etc.
Things to consider (2) 2. Is per-access energy always the same? • Per-access energy in fact depends on: • how many bits are switching • how they are switching (0→ 1 or 1→ 0) • It is reasonable to assume constant per-access energy in long-term observation (e. g. , n = 1 M clock cycles); the number of switching bits are averaged (e. g. , 50% of bits are switching). • Most architecture simulators do not capture bit-level details due to simulation complexity.
Things to consider (3) 3. If a register file didn’t have read/write accesses but held data, what is the energy dissipation? • Energy (or power) is largely comprised of dynamic and static dissipations. • Dynamic (or switching) energy refers to energy dissipation due to switching activities. • Static (or leakage) energy is dissipation to keep the electronic system turned on. • In this case, the register file has no dynamic energy dissipation but consumes static energy.
- Slides: 8