Hardware Performance Counters Parapet Research Group Princeton University

  • Slides: 16
Download presentation
Hardware Performance Counters Parapet Research Group, Princeton University EE for Detailed Runtime Power and

Hardware Performance Counters Parapet Research Group, Princeton University EE for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk ISCI Gilberto CONTRERAS Margaret MARTONOSI Workshop on Hardware Performance Monitor Design and Functionality HPCA-11 Feb 13, 2005

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Hardware

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Hardware Performance Counters (HPCs) Go beyond Performance § Several explored research avenues § Runtime power/thermal estimations § Dynamic management § Workload phases and application behavior prediction § HPCs provide value beyond simulations § Long-timescales § Real-system behavior 2 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Hardware

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Hardware Performance Counters (HPCs) Go beyond Performance § Runtime power § Isci & Martonosi [MICRO 2003] § Contreras & Martonosi [Submitted 2005] § Runtime thermal § Lee & Skadron [HP-PAC in IPDPS 2005] § Dynamic power management § Choi et al. [ISLPED 2004] § Weißel & Bellosa [CASES 2002] § Dynamic thermal management § Bellosa et al. [COLP 2003] § Workload phases and application behavior prediction § Isci & Martonosi [WWC 2003] § Duesterwald et al. [PACT 2003] 3 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals High-Performance

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals High-Performance Corner: P 4 Power Estimation § Idea: Power of component I = Max. Power[I] x Arch. Scaling[I] x Access. Rate[I] + Non. Gated. Power[I] § Motivation: § Fast (Real-time) § Estimated view of on-chip detail (Per physical component) § Design: § Developed heuristics using 24 events to approximate access rates for 22 chip components § Used 15 counters with 4 rotations to collect all event data § Validation: § Real-time estimates against real-time measured power 4 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals P

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals P 4 Power Estimator Results Gcc Gzip Vpr Vortex Gap Crafty Measured Modeled § Average difference: ~5% among all benchmarks § SPEC CPU 2000 & other applications 5 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Embedded

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Embedded Corner: PXA 255 Power Estimation § Idea: CPU Powernx 1 = Performance. Eventsnx 5 x Linear. Parameters 5 x 1 + Idle. Power Mem Powernx 1 = Performance. Eventsnx 2 x Linear. Parameters 2 x 1+ Idle. Power § Motivation: § Runtime power optimizations under DVFS § Design: § Parameter estimation (OLS) using dominant counter readings and live power measurements § Power estimation at various CPU configurations § Validation: § Comparison between estimates and real-time measured power 6 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals PXA

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals PXA 255 Results § DB CDC Java § 5% average error across 3 domains § Java CDC § Java CLDC § SPEC 2000 7 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from Experiences § 1. Track each physical unit individually for power & thermal: § Ex: μCode ROM μop Queue Instr-n Queue 1 Allocate Rename Instr-n Queue 2 Schedulers Trace Cache Dispatch Ports MEM All tracked with in-flight μops written to μop queue § Need individual utilization counts for each physical unit available on die for power and hotspot analyses 8 Canturk Isci, Gilberto Contreras, Margaret Martonosi EXE

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from Experiences § 2. Need bitline activity counts § Utilization is not complete information, power in part depends on switching factor 30 m. W (10%) swing 400 Mhz 1. 3 V PXA 255 Processor § Not necessarily fully detailed counts Accumulate bitwise XOR of current and previous input/output ports Sample Reg. File ports/bit populations 9 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from Experiences § 2. Need bitline activity counts § Utilization is not complete information, power in part depends on switching factor + 000… 01 111… 11 + 20 m. W swing + + 000… 01 000… 00 111… 11 000… 00 + 111… 11 400 Mhz 1. 3 V PXA 255 Processor A 000… 01 000… 01 : 000… 01 B 111… 11 000… 00 001… 11 000… 00 : 000… 11 000… 00 000… 01 000… 00 § Not necessarily fully detailed counts Accumulate bitwise XOR of current and previous input/output ports Sample Reg. File ports/bit populations 10 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from Experiences § 3. More detailed off-chip/memory access support in the embedded domain § Mem Power ~40% of system power § Tracking memory hierarchy transactions may help render better memory power estimates Main memory Read/Writes Ø Core + DMA Transaction length in bytes REX Memory power consumption (one 16 b bank) 11 Canturk Isci, Gilberto Contreras, Margaret Martonosi Activity factors can be shared with Reg. File

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from Experiences § 4. Metrics related to queue occupancy § Modern processor ≡ Several queues § Depending on implementation Power ∝ Queue occupancy Buyuktosunoglu et al. [ISLPED’ 02] Tradeoffs in Power-Efficient Issue Queue Design 12 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals from Experiences § 5. General/aggregate metrics in addition to specialized cases/ breakdowns simplify runtime sampling for unit accesses § P 4 ex 1. MOB: Only event MOB_load_replays Counts replays for unknown st addr. /data, partial/unaligned addr. match No info for MOB entries/accesses/updates § P 4 ex 2. FPU: Has 8 separate events (with 2 dedicated ESCRs) Need at least 4 rotations to collect § P 4 ex 3. INT ALU: No dedicated event 13 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Additional

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Additional Comments for HPC Design § General/aggregate metrics in addition to specialized cases/ breakdowns simplify runtime sampling for unit accesses § Metrics related to Reg. File accesses vs. forwarding § Semi-distributed implementations will always induce dependencies among simultaneously countable events § Higher parallelism among (power oriented) metrics for minimal counter rotations at runtime § Implementations that allow counter rotations without need for intermediate logging Partitioned / Dual-mode / Buffered counters § Different events for different types of accesses to same units with different magnitude power implications § i. e. branch scan < BHT update < BTA update § Different API/SW demands: § Lightweight implementations for runtime analyses § Per-thread for application profiling vs. global for real-time measurement comparisons and hotspots 14 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Wishlist

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Wishlist for Power/Thermal § 1) For each physical unit on die, separate events to track utilization rates § Sub events for different type of accesses with different power costs § 2) Bitline activity counters for switching units § 3) Occupancy counters for related queues § 4) Counter support for off-core memory accesses § 5) High parallelism among power events for minimal counter rotations 15 Canturk Isci, Gilberto Contreras, Margaret Martonosi

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Conclusions

Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Conclusions § New opportunities remain to be explored in future PMC designs for power and thermal studies § Direct correspondence to physical units § Bitline and occupancy counters § We believe in the feasibility of these additions with the continuing emphasis given to counter design, as long as power is also considered a primary design target. 16 Canturk Isci, Gilberto Contreras, Margaret Martonosi