Introduction to Open Source Performance Tool Linux Tool

  • Slides: 19
Download presentation
Introduction to Open Source Performance Tool --Linux Tool Perf Yiqi Ju (Fred) Sep. 13,

Introduction to Open Source Performance Tool --Linux Tool Perf Yiqi Ju (Fred) Sep. 13, 2012

Task 07/09~09/14 ¢ Verizon Box ¢ Embedded System ¢ Software Environment ¢ Open Source

Task 07/09~09/14 ¢ Verizon Box ¢ Embedded System ¢ Software Environment ¢ Open Source Performance Tools ¢ Kernel Profiling ¢

Kernel Profiling? Collect and analyze kernel space system-wide resource statistic ¢ HW trend—increasing core

Kernel Profiling? Collect and analyze kernel space system-wide resource statistic ¢ HW trend—increasing core numbers ¢ SW performance—find the bottleneck ¢ Solution—full use of available tools ¢

Available Tools Top(on board)/htop --real-time monitoring ¢ Sysstat utilities --sar, iostat (on board), vmstat…

Available Tools Top(on board)/htop --real-time monitoring ¢ Sysstat utilities --sar, iostat (on board), vmstat… ¢ SS—socket statistics ¢ Lttng—kernel tracing ¢ Perf—counting and sampling ¢… ¢

Perf Tool Perf_event kernel interface ¢ Linux kernel subsystem, merged into v 2. 6.

Perf Tool Perf_event kernel interface ¢ Linux kernel subsystem, merged into v 2. 6. 31 and after ¢

Perf_event Kernel Interface ¢ ¢ ¢ Performance counter—hardware counter, no bother register, often called

Perf_event Kernel Interface ¢ ¢ ¢ Performance counter—hardware counter, no bother register, often called PMU (Performance Measurement Unit) Event-oriented API—do not use HW register but relies on PMU ready CPUs Support Events grouping, measure simultaneously Source: Perf File Format, Urs Fassler. CERN openlab

Sampling Perf record initializes sampling through perf_event interface ¢ Create blank mmap pages to

Sampling Perf record initializes sampling through perf_event interface ¢ Create blank mmap pages to kernel space ¢ Kernel writes record and send back to perf, perf record *. data file and save to current directory ¢

Sampling cont. Blank mmap pages generated through perf_events Written mmap page Source: Perf File

Sampling cont. Blank mmap pages generated through perf_events Written mmap page Source: Perf File Format, Urs Fassler. CERN openlab

Advantage Low overhead—compare to instrumenting profiling ¢ Fast—counting is done at the time the

Advantage Low overhead—compare to instrumenting profiling ¢ Fast—counting is done at the time the load is off, even cannot tell delays ¢ Bunch of usages, provides much information ¢

Perf usage metro-root-perf_record> perf usage: perf [--version] [--help] COMMAND [ARGS] The most commonly used

Perf usage metro-root-perf_record> perf usage: perf [--version] [--help] COMMAND [ARGS] The most commonly used perf commands are: annotate Read perf. data (created by perf record) and display annotated code diff Read two perf. data files and display the differential profile list List all symbolic event types lock Analyze lock events probe Define new dynamic tracepoints record Run a command record its profile into perf. data report Read perf. data (created by perf record) and display the profile sched Tool to trace/measure scheduler properties (latencies) stat Run a command gather performance counter statistics timechart Tool to visualize total system behavior during a workload top System profiling tool. trace Read perf. data (created by perf record) and display trace output …

List of Events List of pre-defined events (to be used in -e): cpu-cycles OR

List of Events List of pre-defined events (to be used in -e): cpu-cycles OR cycles instructions cache-references cache-misses branch-instructions OR branches branch-misses bus-cycles [Hardware event] [Hardware event] cpu-clock task-clock page-faults OR faults minor-faults major-faults context-switches OR cs cpu-migrations OR migrations alignment-faults emulation-faults [Software event] [Software event] [Software event] L 1 -dcache-loads L 1 -dcache-load-misses L 1 -dcache-store-misses L 1 -dcache-prefetch-misses L 1 -icache-load-misses L 1 -icache-prefetch-misses LLC-load-misses LLC-store-misses LLC-prefetch-misses d. TLB-load-misses d. TLB-store-misses d. TLB-prefetch-misses i. TLB-load-misses branch-load-misses … [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event]

Perf stat metro-root-perf_record> perf stat -e L 1 -dcache-loads -e L 1 -dcache-load-misses -e

Perf stat metro-root-perf_record> perf stat -e L 1 -dcache-loads -e L 1 -dcache-load-misses -e d. TLB-load-misses -e L 1 -icache-loads -e L 1 -icache-misses start_appli Start_appli… Performance counter stats for 'start_appli': 354543239 <not counted> 507073444 305313 2303127335 7994049 L 1 -dcache-loads L 1 -dcache-load-misses d. TLB-load-misses L 1 -icache-load-misses (scaled from 80. 54%) (scaled from 83. 87%) (scaled from 83. 89%) (scaled from 83. 80%) (scaled from 84. 33%) 74. 850334944 seconds time elapsed ----(Data from mt 2179, P 1. 0 board, 12: 25 AM, 9/12/2012) missrate: 0. 0602% missrate: 0. 347%

Perf stat cont. metro-root-perf_record> perf stat -e d. TLB-loads -e d. TLB-load-misses -e L

Perf stat cont. metro-root-perf_record> perf stat -e d. TLB-loads -e d. TLB-load-misses -e L 1 icache-loads -e L 1 -icache-misses start_appli … Performance counter stats for 'start_appli': 534611783 d. TLB-loads 308219 d. TLB-load-misses 2375996954 L 1 -icache-loads 7810360 L 1 -icache-load-misses missrate: 0. 0577% missrate: 0. 329% 55. 029461151 seconds time elapsed ----(Data collected from mt 2179, P 1. 0 board, 12: 35 PM, 9/12/2012)

Perf record/report metro-root-perf_record> perf record -F 3000 -o startapp. data start_appli … [ perf

Perf record/report metro-root-perf_record> perf record -F 3000 -o startapp. data start_appli … [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0. 560 MB startapp. data (~24470 samples) ] … metro-root-perf_record> perf report -i startapp. data > startapp. txt

(Data collected from mt 2179, P 1. 0 board, 12: 35 PM, 9/12/2012)

(Data collected from mt 2179, P 1. 0 board, 12: 35 PM, 9/12/2012)

Perf diff metro-root-perf_record> perf diff lsactive. data lslactive. data (Data collected from mt 2179,

Perf diff metro-root-perf_record> perf diff lsactive. data lslactive. data (Data collected from mt 2179, P 1. 0 board, 12: 35 PM, 9/12/2012)

More on future Perf timechart—visualize total system behavior in time sequence ¢ Perf trace—enable

More on future Perf timechart—visualize total system behavior in time sequence ¢ Perf trace—enable script tracing, Perl support from 2. 6. 33 -rc, Python support patches available ¢ Perf annotate—source code allocation ¢ Perf event converter, web-based GUI enable remote profiling ¢

Source: Scripting support for perf. Jake Edge, Feb 10, 2010

Source: Scripting support for perf. Jake Edge, Feb 10, 2010

References ¢ ¢ ¢ Perf_event project http: //web. eecs. utk. edu/~vweaver 1/projects/perfevents/index. html Perf

References ¢ ¢ ¢ Perf_event project http: //web. eecs. utk. edu/~vweaver 1/projects/perfevents/index. html Perf File Format by CERN openlab http: //openlab. web. cern. ch/sites/openlab. web. cern. ch/file s/technical_documents/Urs_Fassler_report. pdf Perf wiki https: //perf. wiki. kernel. org/index. php perf_events status update by Stephane Eranian, Google, Inc. Kenel mailing list http: //lwn. net/Articles/373842/