Memory Performance Profiling via Sampled Performance Monitor Event
- Slides: 27
Memory Performance Profiling via Sampled Performance Monitor Event Traces Diana Villa, Jaime Acosta, Patricia J. Teller The University of Texas at El Paso Department of Computer Science Bret Olszewski Trevor Morgan IBM Corporation – Austin, TX Exxon/Mobil Department of Computer Science
Outline § Motivation § Data Collection Environment • Workload & Platform • Monitored Events § § Sampled Event Traces Performance Evaluation Framework Data Analysis & Results Conclusions and Future Work Department of Computer Science
Motivation § Modern Systems Performance governed by memory subsystem § SMPs • Deeper and larger memory hierarchies • Performance analysis considerations Time to results and size of data set § Goal Develop a new performance analysis methodology Department of Computer Science
Data Collection Environment § Workload • TPC-C benchmark § Commercial § OLTP § Platform • IBM e. Server p. Series 690 architecture (p 690) 8 - and 32 -processor configurations Department of Computer Science
Platform P X 8 -processor p 690 configuration MCM 0 MCM 1 X P P L 2 X L 2 L 3 P X P L 2 Department of Computer Science X L 2 P X L 2
Platform P 32 -processor p 690 configuration P MCM 0 MCM 1 P P P L 2 L 3 P P P L 2 MCM 3 P P P L 2 P L 2 L 3 P P L 2 Department of Computer Science P P L 2
Monitored Events § L 2 -cache data-load misses • • • L 2. 5 L 2. 75 L 3. 5 MEM § L 1 -cache data-load miss • L 2 Department of Computer Science
L 2 Load Latencies 12 cycles L 2 P X MCM 0 MCM 1 X P P L 2 X L 2 L 3 P X L 2 Department of Computer Science P X L 2
Load Latencies L 2 12 cycles L 2. 5 73 cycles L 2. 5 P X MCM 0 MCM 1 X P P L 2 X L 2 L 3 P X L 2 Department of Computer Science P X L 2
Load Latencies L 2 12 cycles L 2. 5 73 cycles L 2. 75 96 cycles L 2. 75 P X MCM 0 MCM 1 X P P L 2 X L 2 L 3 P X L 2 Department of Computer Science P X L 2
Load Latencies L 2 12 cycles L 2. 5 73 cycles L 2. 75 96 cycles L 3 112 cycles L 3 P X MCM 0 MCM 1 X P P L 2 X L 2 L 3 P X L 2 Department of Computer Science P X L 2
Load Latencies L 2 12 cycles L 2. 5 73 cycles L 2. 75 96 cycles L 3 112 cycles L 3. 5 143 cycles P X MCM 0 MCM 1 X P P L 2 X L 3. 5 P L 2 X L 2 L 3 P X L 2 Department of Computer Science P X L 2
Load Latencies L 2 12 cycles L 2. 5 73 cycles L 2. 75 96 cycles L 3 112 cycles L 3. 5 143 cycles MEM 320 cycles P X MCM 0 MCM 1 X P P L 2 X L 2 L 3 P X L 2 Department of Computer Science P X L 2
Data Collection § 10 -minute observation interval § Performance Monitoring Unit (PMU) • Special-purpose registers • Programming interface Kernel extension § eprof • PMU configuration • Event-based sampling Department of Computer Science
Sampled Event Traces § Sampling • Record periodic occurrences of an event • 100 events/sec/CPU § Event record 372872 PID 184469 TID 0. 328104637 000000 A 8 C 4 00000218880 Timestamp Effective Instruction Address § Average number of samples collected/event • 238, 448 for 8 -processor data • 212, 396 for 32 -processor data Department of Computer Science Effective Data Address
Performance Framework Data Collection Environment TPC-C p 690 Sampled Event Traces PID TID Timestamp Instr. Addr. Data. Addr. PID Timestamp Instr. Addr. Data. Addr. Database Load DB Java Tool Report Generation Java Tool Reports 5 Buffer. Pool 56893 29384 6 Data, BSS, Heap 8799 4855 1 Kernel 23485 9840 Department of Computer Science Graphs
Analysis • Identify application-specific sources of performance degradation associated with data references Address space …. Page kernel Level of memory hierarchy Instruction/ Data Structure …. text …. data, bss, heap …. buffer pool …. Department of Computer Science Segment Page offset/ Cache line
Results Department of Computer Science
32 -Processor Results Memory Regions Department of Computer Science
32 -Processor Results L 3 Caches Department of Computer Science
32 -Processor Results Segments Department of Computer Science
32 -Processor Results Pages Department of Computer Science
32 -Processor Results Cache Lines Department of Computer Science
32 -Processor Results Instructions Lock Operations Atomic Operations simple_lock fetch_and_add simple_lock_ppc fetch_and_add_h simple_unlock fetch_and_addlp disable_lock fetch_and_or unlock_enable fetch_and_orlp simple_unlock_mem fetch_and unlock_enable_mem fetch_andlp Department of Computer Science
Conclusions § Targets for performance improvement of TPC-C are associated mainly with two regions of the address space: • buffer pool • data, bss, heap § TPC-C lock instructions are not key to performance degradation § 8 - and 32 -processor data have same reference pattern, thus, a model of TPC-C memory access may be possible Department of Computer Science
Future Work § Suggest ways to improve p 690 application performance § Enhance performance evaluation framework § Quantify representativeness of sampled event traces § Expand study of application data load behavior • • • Process characterization Process migration Other performance issues § Compulsory vs. capacity/conflict misses, false sharing, contention for resources § Develop synthetic applications § Mimic the behavior of key p 690 applications § Use these to study application behavior and experiment with modifications to applications that may affect performance Department of Computer Science
Thank You. Questions? Department of Computer Science
- Qnx performance profiling
- Performance profiling wheel
- Performance profiling process
- Explain how a sound can be ‘sampled’
- A one hectare pond is sampled
- 7l128
- Sampled lru
- Efficient simplification of point-sampled surfaces
- Via crucis y via lucis
- Via positiva and via negativa
- 8 estacion del via lucis
- Lesion primera y segunda motoneurona
- Palavras convergentes
- Near miss report بالعربي
- Simple and compound events examples
- Independent event vs dependent event
- Dependent events examples
- Swot analysis event planning
- Bridge breaks in central java
- To inform the readers/listeners about a newsworthy event
- Sql server 2005 performance
- Cisco ucs performance monitor
- Mssql performance monitor
- Rocky slowly got up from the mat
- Excplicit memory
- Long term memory vs short term memory
- Internal memory and external memory
- Primary memory and secondary memory