Lost in the SMT World Fabio Massimo Ottaviani
Lost in the SMT World Fabio Massimo Ottaviani EPV Technologies User Group 2015
Agenda • Introduction • Terminology • SMT overview • Capacity Factors • CORE productivity and utilization • MT-1 equivalent time • Conclusions 2
Introduction 3
Introduction • Simultaneous Multi Threading (SMT) is already used on other platforms • Currently available technologies can’t provide big additional improvements in processor speed so IBM started introducing SMT on the Mainframe with the z 13 announcement • Only for z. IIP and IFL (for the moment) • The reason of this prudent approach is that, from Capacity Management point of view, this is a very critical change 4
Terminology 5
Terminology 6
Terminology “The CPU Activity section reports on logical core and logical processor activity. For each processor, the report provides a set of calculations that are provided at a particular granularity that depends on whether multithreading is disabled or enabled. . . ” RMF Report Analysis V 2 R 2 SC 34 -2665 -02 7
Terminology “If multithreading is disabled for a processor type, all calculations are at logical processor granularity. If multithreading is enabled for a processor type, some calculations are provided at logical core granularity and some are provided at logical processor (thread) granularity. ” RMF Report Analysis V 2 R 2 SC 34 -2665 -02 8
Terminology • What do you mean by CPU if you are: üPR/SM PU - Physical Processor - CP - CORE üz/OS Logical Processor - LCP - Logical CORE Thread üApplication Logical Processor – Thread SMT terms 9
SMT overview 10
SMT overview • Mainframe cores process instructions in multiple pipes composed of a number of stages each performing one step in the processing of an instruction, similar to an assembly line • But a core can operate on a single instruction stream • A big part of the core capacity is normally wasted when an instruction stream gets stalled waiting for a cache miss to be resolved 11
SMT overview • With SMT, multiple instruction streams can be processed simultaneously; when a thread is waiting for a cache miss the core can continue doing work on behalf of the other threads • Unfortunately, the additional throughput from SMT does not scale very well with the number of threads • This is because all the threads on a core share some limited resources (e. g. pipes, processor cache, TLB) 12
SMT overview • To activate SMT on z/OS, you have to: üdefine the PROCVIEW CORE option in LOADxx; if you do not want to use SMT you can omit the PROCVIEW parameter or specify PROCVIEW CPU which is the default; IPL is needed to change it üset MT_ZIIP_MODE=2 in IEAOPTxx; it can be dynamically changed 13
SMT overview • MT-1 means that there is only 1 thread per CORE; this is the only possible option for standard CPUs at the moment • MT-2 means that there are 2 threads per CORE; you can activate it on z. IIPs (or IFLs) 14
SMT overview MT-1 Faster execution Lower throughput MT-2 Slower execution Higher throughput 15
SMT overview • Expected speed reduction when 2 threads active: üSimilar to having more slower engines üIn the 30 -40% range 16
SMT overview • Throughput variability: üThroughput depends on workload (threads) characteristics üOn average up to 40% increase when 2 threads active üBut it may also decrease 17
SMT overview APPL APPL APPL THR THR THR CORE CORE Max Throughput Max Core Availability z/OS SMT UNAWARE PR/SM Limit variability Physical HW THR THR CORE 18
Capacity Factors 19
Capacity Factors • The MT-2 Maximum Capacity Factor (MAX CF) is the ratio of the maximum amount of work that can be accomplished using 2 threads to the amount of work that would have been accomplished with 1 thread • MT-1 Max Capacity Factor is 1. 0 • MT-2 Max Capacity Factor is workload dependent; max theoretical value is 2 20
Capacity Factors • The MT-2 Capacity Factor (CF) is the ratio of the maximum amount of work that has been accomplished using 1 or 2 threads to the amount of work that would have been accomplished with multithreading disabled • If most of the time Thread Density (TD) is 1, CF should be close to 1; if most of the time TD is 2, CF should be close to MAX CF 21
Capacity Factors • In this RMF report snapshot you can note that: üMT-1 is used for CP; MAX CF, CF and AVG TD value is 1 üMT-2 is used for z. IIP; MAX CF is 1, 804 and CF is 1, 746 üz. IIP CF and MAX CF are very close because TD is almost 2 (1, 928) 22
Capacity Factors • MAX CF is an estimated value of the maximum possible throughput • It is also used to re-evaluate CPU utilization which is not simply measured anymore in MT 2 • This is needed to maintain a proportion between Throughput and Utilization 23
Capacity Factors • Throughput unit of measurement is not relevant here • Throughput, Thread 1 and Thread 2 busy are measured • Throughput at 100% of 1 Thread busy calculated as 1000/0, 8 24
Capacity Factors • Without an estimate of the max possible throughput increase with two threads (MAX CF), Core busy can not be calculated 25
Capacity Factors • Max Throughput = 1. 250*(1+MAX CF) • Core busy = 1. 000/1. 750 • A linear relationship between throughput and core busy has been assumed 26
CORE productivity and utilization 27
CORE productivity and utilization • CORE productivity is the percentage of the maximum core capacity that has been used while the logical core was dispatched to physical hardware • If CORE productivity equals 100% all threads on the core are executing work and all core resources are being used • It’s calculated as a ratio between CF and 28
CORE productivity and utilization MT % PROD = 1, 746 / 1, 804 = 96, 78 29
CORE productivity and utilization • LPAR busy simply tells you that the logical core is dispatched • CORE utilization is supposed to be a more precise metric than LPAR busy; it should tell you how much work the CORE can still execute • CORE utilization is calculated by multiplying LPAR busy and CORE productivity 30
CORE productivity and utilization MT % UTIL = 76, 08 * 96, 48 / 100 = 73, 40 31
MT-1 equivalent time 32
MT-1 Equivalent Time • With SMT enabled all accounting fields (SMF 30, 72, etc) reports CPU consumption of workloads as MT-1 Equivalent Time and Service Units • MT-1 Equivalent Time is the CPU time that would have taken to run the same work in MT-1 mode • MT-1 Equivalent Time is internally calculated as MAX CF * CORE busy time / TD 33
MT-1 Equivalent Time • Most important consequence of MT-1 Equivalent Time measurements is that when working in MT-2 you have to change the calculation of the capacity used by any workload • Example of old algorithm: üWorkload A used 1. 800 CPU seconds in 1 hour ü 1 CORE is targeted 1. 000 MIPS üused COREs = 1. 800 / 3. 600 = , 5 34
MT-1 Equivalent Time • Now you need to include MAX CF in the formula • Example of new algorithm: üWorkload A used 1. 800 CPU seconds in 1 hour ü 1 CORE is targeted 1. 000 MIPS üMAX CF is 1, 25 üused COREs = 1. 800 / (3. 600 * 1, 25) = , 4 üused MIPS = 1. 000 * , 4 = 400 35
Conclusions 36
Conclusions • The introduction of SMT changed an important part of the Mainframe terminology • With SMT new metrics have been added which have to be clearly understood in order to perform correct Capacity Management activities • Most of the currently used accounting formulas should be reviewed especially if SMT will be extended to standard CP 37
Questions ? fabio. ottaviani@epvtech. com epv. support@epvtech. com www. epvtech. com 38
- Slides: 38