Performance Monitoring on Pentium 4 Processor Nidhi nidhiintel

  • Slides: 10
Download presentation
Performance Monitoring on Pentium® 4* Processor Nidhi nidhi@intel. com IA 32 Performance Architect *Pentium

Performance Monitoring on Pentium® 4* Processor Nidhi nidhi@intel. com IA 32 Performance Architect *Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries

Outline l Pentium® 4* Processor Performance Monitoring features l Implementation l How Intel® uses

Outline l Pentium® 4* Processor Performance Monitoring features l Implementation l How Intel® uses Performance Monitors l Limitations l Open issues *Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries

Feature Overview Counters 18 40 -bit programmable counters Events 45 events in various parts

Feature Overview Counters 18 40 -bit programmable counters Events 45 events in various parts of the machine Counter increment control – – – qualification by current privilege level (O/S, USER) qualification by hardware thread id edge detection threshold comparison interrupt on counter overflow Interface ( x 86 instructions to set/read counters) – – WRMSR (write machine status register) RDMSR (read machine status register) RDPMC (read performance monitoring counter) RDTSC (read time-stamp counter) *Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries

Features Overview, cont. l Cascading – Second counter begins counting when first counter overflows

Features Overview, cont. l Cascading – Second counter begins counting when first counter overflows – For instance, to measure cycles elapsed after the first counter overflowed. l Tagging – Used to get non-speculative event counts – Tags micro-ops when they incur an event – Counts tagged micro-ops at retirement – Three tagging mechanisms: front-end, execution, and replay *Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries

Precise Event Based Sampling l Mechanism – User allocates a PEBS buffer in memory

Precise Event Based Sampling l Mechanism – User allocates a PEBS buffer in memory – User programs a counter to tag micro-ops and count them as they retire – When the counter overflows, the Pentium® 4 Processor ’s retirement logic forces a microcode assist just before the next tagged micro-op – Microcode assist copies the program counter and GPRs into the PEBS buffer in memory l Advantages – Precise: taken at instruction which had an event – Enables creation of data address profiles and locate cache lookup patterns and data relocation opportunities *Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries

Implementation Overview *Pentium is a trademark or registered trademark of Intel Corporation or its

Implementation Overview *Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries

How Intel® Uses Performance Monitors l Intel® uses Performance Monitoring for: – Performance Analysis

How Intel® Uses Performance Monitors l Intel® uses Performance Monitoring for: – Performance Analysis – Compiler optimizations – System level optimizations – Performance and functional debug l Many tools built for analyzing and collecting Performance monitoring counters – Interval Sampler – Profiler *Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries

Performance Analysis l Interval sampler – Gives the characteristics of the system l VTune™

Performance Analysis l Interval sampler – Gives the characteristics of the system l VTune™ Performance Analyzer – Event Profiler – Gives the distribution of events for the system over the whole application run – Available at: http: //www. intel. com/software/products/vtune/ l Interval Sampler points out which events to look for, VTune™ event profiles then help find the function, basic block or the IPs that have the performance problem. *Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries

Limitations l l Not all counters can count all events. With hyperthreading, the counters

Limitations l l Not all counters can count all events. With hyperthreading, the counters may get divided among the logical processors. *Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries

Open Questions l Centralized Vs. Distributed? – Distributed is simpler but less flexible l

Open Questions l Centralized Vs. Distributed? – Distributed is simpler but less flexible l Add new events? – New usage models – Multicore / Multithread scenarios l Feedback is welcome! *Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries