Lecture 1 Introduction Course Outline The aim of







































- Slides: 39
Lecture 1: Introduction
Course Outline The aim of this course: l l Introduction to the methods and techniques of performance analysis of computer systems. Solve computer performance analysis problems related to • • • l measuring performance of computer systems, comparison of computer systems predicting the future performance under different configurations, designing new applications that meet performance requirements planning the capacity Hands-on experiments on modern hardware/software systems
Course Outline 1. 2. 3. 4. 5. 6. 7. 8. Introduction Hardware and software aspects of computer systems Performance metrics Performance measurement tools and techniques Benchmarking Statistical analysis of performance experiments Design of experiments Processor Performance • • • ALU Pipelining Optimizing program performance 9. Memory Hierarchy • • Cache performance Optimizing program performance 10. Performance of multiprocessor systems 11. Simulation 12. Queueing Theory
Course Outline Textbook: l D. Lilja, “Measuring Computer Performance: A Practitioner's Guide”, Cambridge University Press Reference Books: l l l R. Jain, “The Art of Computer Systems Performance Analysis”, John Wiley P. J. Fortier, H. E. Michel, “Computer Systems Performance Evaluation and Prediction”, Digital Press K. R. Wadleigh, I. L. Crawford, “Software Optimization for High Performance Computing”, Prentice-Hall Computer Systems: A Programmer’s Perspective, R. E. Bryant, D. R. O’Hallaron, Pearson Computer Architecture, J. L. Hennessy, D. A. Patterson, Morgan & Kaufmann High Performance Computing, K. R. Wadleigh, I. L. Crawford, Prentice Hall
Course Outline Grading: l l l Assignments Midterm Final Exam 30% 40%
Performance Evaluation of Computer Systems Computer systems consist of: • Processor • Memory • Input/Output • Operating system • Network Memory instruction Input unit P data Output unit Processor P P Network P
Performance Evaluation of Computer Systems Performance depends on: • Technology
Technology l In recent years, microprocessors have become smaller and denser. 1945 2010 Computer ENIAC Laptop Devices 18 000 17 000 000 Weight (kg) 27 200 2. 8 68 0. 0018 20 000 5. 5 4 630 000 1 000 Memory (bytes) 200 2 147 483 648 Performance (Flops/s) 800 2 000 000 Size (m 3) Power (watts) Cost ($)
Moore’s Law l Gordon Moore predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.
Moore’s Law • Number of transistors • Performance Double every 1. 5 year.
Top 500 List – 2017 Nov. Rank System 1 Sunway Taihu. Light - China 2 Cores Rmax (TFlop/s) Rpeak (TFlop/s) Power (k. W) 10, 649, 600 93, 014 125, 435 15, 371 Tianhe-2 -China 3, 120, 00 33, 862 54, 902 17, 808 3 Piz Daint -Switzerland 361, 760 19, 590 25, 326 2, 272 4 Gyoukou -Japan 19, 860, 000 19, 135 28, 192 1, 350 5 Titan -United States 560, 640 17, 590 27, 112 8, 209
Units of High Performance Computing Speed l Capacity Kilo 1 Kflop/s 103 Flop/second 1 KB 103 Bytes Mega 1 Mflop/s 106 Flop/second 1 MB Giga 1 Gflop/s 109 Flop/second 1 GB 109 Bytes Tera 1 Tflop/s 1012 Flop/second 1 TB 1012 Bytes Peta 1 Pflop/s 1015 Flop/second 1 PB Exa 1 Eflop/s 1018 Flop/second 1 EB 1018 Bytes Zeta 1 Zflop/s 1021 Flop/second 1 ZB 1021 Bytes Current fastest: 125 Pflop/s, 1. 3 PB memory 106 Bytes 1015 Bytes
Top 500 List 9 years Laptop 70 GFlop/s Mobile phone 4 GFlop/s
Moore’s Law Limits of Moore’s Law: l Moore’s Law is exponential. Exponentials can not last forever. l Heat is a problem in today’s CPUs l The size of atoms is the fundamental barrier
Moore’s Law Reinterpreted l Number of cores per chip doubles every 2 years • Multicore architectures
Moore’s Law Reinterpreted l Number of cores per chip doubles every 2 years, while clock speed decreases • Multicore architectures
Performance Evaluation of Computer Systems Performance depends on: • Technology • Instruction Set Architecture
Instruction Set Architecture-ISA Instruction Set Design: l l RISC / CISC • Code density Number of operands • • • Stack machines (0 -operand) Accumulator machines (1 -operand) Register machines (2 -operand, 3 -operand)
Performance Evaluation of Computer Systems Performance depends on: • Technology • Instruction Set Architecture • Organization
Processor-Memory Problem Processor-Memory Performance Gap Processor-Memory performance gap grows 50% per year
Organization Memory Hierarchy CPU Hierarchy Within the processor (CPU-registers-on chip cache) L 2 cache (SRAM) Speed Size 1 ns Byte L 1 Cache 10 ns KByte L 2 Cache Main Memory (DRAM) 100 ns MByte Secondary storage (Disk) 10 ms Gbyte Tertiary Storage (Tape/Disk) Registers 10 s TByte Main Memory Disk Tape
Organization Manycore Chips Single-core Dual-core CPU CPU Registers L 1 Cache L 2 Cache Main Memory
Performance Evaluation of Computer Systems Performance depends on: • Technology • Instruction Set Architecture • Organization • Software
Software l The primary duty of software developers is to create functionally correct programs l Performance evaluation is a part of software development for well-performing programs
Performance Analysis Cycle Code Development Functionally complete and correct program Measure Analyze Modify / Tune Complete, correct and well-performing program Usage l Have an optimization phase just like testing and debugging phase
Systematic Approach to Performance Evaluation 1. Define the system 2. List services offered by the system 3. Select performance metrics 4. List system and workload parameters 5. Select factors and their values 6. Select evaluation technique 7. Select the workload 8. Design the experiment 9. Analyze the data 10. Present the results
1. Define the system An Example: Client Network Server
2. List services offered by the system Service: Remote procedure call
3. Select performance metrics Metrics: • Time taken for the service • • Elapsed time Local CPU time Remote CPU time The rate at which the service can be performed • calls per second
4. List system and workload parameters • System Parameters • • • Speed of the network Speed of the Local CPU Speed of the Remote CPU Operating system overhead Workload Parameters • • Time between successive calls Number and sizes of the call parameters
5. Select factors and their values Factors are the parameters to be varied and their values are called levels. For example: • Factor: 2 levels: • • Factor: 2 levels: Factor: 11 levels: speed of the network; short distance (in the campus), long distance (across the country) Sizes of the call parameters; small, large number of consecutive calls; 1, 2, 4, 8, … 1024
6. Select evaluation technique Three techniques: • • • Analytical modeling Simulation Measuring the real system
7. Select the workload Depending on the evaluation technique, the workload may be expressed in different forms. • Analytical modeling • • Simulation • • probability of various requests a trace of requests measured on a real system Measurement • user programs
8. Design the experiment • In the example: 2 x 2 x 11=44 experiments • Phase 1 • • Number of factors is large but number of levels is small Phase 2 • Reduce the number of factors and increase the number of levels
9. Analyze the data • • • Analysis of Variance Regression etc.
10. Present the results • Use graphical form to represent the data rather than statistical results
Top 500 List at June 2016 Computer 1 Taihu. Light 2 Tianhe-2 3 Titan Country China USA Vendor NRCPC NUDT Cray Inc. Processor + GPU + interconnect # cores Rmax (Pflops) Rpeak (Pflops) Sunway MPP 10, 649, 600 93, 014 125, 435 Xeon 2. 2 GHz+ 3, 120, 000 Nvidia GPU + custom 33, 862 54, 902 17, 590 27, 112 Opteron 2. 2 GHz+ Nvidia GPU + CRAY Gemini 560, 640
Performance Units 1 Mflop/s 1 Gflop/s 1 Tflop/s 1 Pflop/s 1 Eflop/s Speed 1 Megaflop/s 1 Gigaflop/s 1 Teraflop/s 1 Petaflop/s 1 Exaflop/s 1 MB 1 GB 1 TB 1 PB Storage 1 Megabyte 1 Gigabyte 1 Terabyte 1 Petabyte 106 Flop/second 109 Flop/second 1012 Flop/second 1015 Flop/second 1018 Flop/second 106 Bytes 109 Bytes 1012 Bytes 1015 Bytes
Moore’s Law Reinterpreted l Number of cores per chip doubles every 2 years, while clock speed decreases • Multicore architectures