Performance CSCI 312 Computer Organization and Architecture Fall
- Slides: 30
Performance CSCI 312 Computer Organization and Architecture Fall 2019 Lecture note Dr. Sajedul Talukder 28 February
Basic Terminology • Bits • The smallest unit of information in a computer • 0 or 1 • Bytes • 8 bits • Eg. 0100 10102 = 4 A 16 = 74 = ‘J’ 8 4 2 1 0100 1010 A = 4 A 4
Basic Terminology (cont) • KB • Kilo Bytes • Eg. 1 KB = 1, 024 Bytes ≈ 1, 000 Bytes • MB • Mega Bytes • Eg. 1 MB = 1, 048, 576 ≈ 1, 000 Bytes • GB • Giga Bytes • Eg. 1 GB = 1, 073, 741, 824 ≈ 1, 000, 000 Bytes = 1 X 109 bytes • TB • Tera Bytes • Eg. 1 TB = 1000 GB
Understanding Performance • Algorithm • Determines number of operations executed • Programming language, compiler, architecture • Determine number of machine instructions executed per operation • Processor and memory system • Determine how fast instructions are executed • I/O system (including OS) • Determines how fast I/O operations are executed
Defining Performance Let’s suppose we define performance in terms of speed.
Defining Performance • Which airplane has the best performance?
CPU Execution
Response Time and Throughput • Response time • How long it takes to do a task • Throughput • Total work done per unit time • e. g. , tasks/transactions/… per hour • How are response time and throughput affected by • Replacing the processor with a faster version? • Adding more processors? • We’ll focus on response time for now…
Quick Question Decreasing response time almost always improves throughput. Hence, in case 1, both response time and throughput are improved. In case 2, no one task gets work done faster, so only throughput increases.
Relative Performance • Define Performance = 1/Execution Time • “X is n time faster than Y”
Relative Performance n Solution: time taken to run a program n n n 10 s on A, 15 s on B Execution Time. B / Execution Time. A = 15 s / 10 s = 1. 5 So A is 1. 5 times faster than B
Measuring Execution Time • Elapsed time • Total response time, including all aspects • Processing, I/O, OS overhead, idle time • Determines system performance • CPU time • Time spent processing a given job • Discounts I/O time, other jobs’ shares • Comprises user CPU time and system CPU time • Different programs are affected differently by CPU and system performance
CPU Clock • A crystal oscillator is an electronic oscillator circuit that uses the mechanical resonance of a vibrating crystal of piezoelectric material to create an electrical signal provide a stable clock signal for digital integrated circuits with a precise frequency. • Operation of digital hardware governed by a constant-rate clock • For example, a 200 MHz CPU receives 200 million pulses per second Crystal oscillator
CPU Clocking • Operation of CPU is governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state n Clock period: duration of a clock cycle n n e. g. , 250 ps = 0. 25 ns = 250× 10– 12 s Clock frequency (rate): cycles per second n e. g. , 4. 0 GHz = 4000 MHz = 4. 0× 109 Hz
CPU Time • Performance improved by • Reducing number of clock cycles • Increasing clock rate • Hardware designer must often trade off clock rate against cycle count
CPU Time Example
CPU Time Example • Computer A: 2 GHz clock, 10 s CPU time • Designing Computer B • Aim for 6 s CPU time • Can do faster clock, but causes 1. 2 × clock cycles • How fast must Computer B clock be?
Instruction Count and CPI: clock cycles per instruction
CPI Example
CPI Example • Computer A: Cycle Time = 250 ps, CPI = 2. 0 • Computer B: Cycle Time = 500 ps, CPI = 1. 2 • Same ISA • Which is faster, and by how much? A is faster… …by this much
CPI in More Detail • If different instruction classes take different numbers of cycles n Weighted average CPI Relative frequency
CPI Example • Alternative compiled code sequences using instructions in classes A, B, C n Class A B C CPI for class 1 2 3 IC in sequence 1 2 2+1+2=5 inst. IC in sequence 2 4 1 1 4+1+1=6 inst. Sequence 1: IC = 5 n n Clock Cycles = 2× 1 + 1× 2 + 2× 3 = 10 Avg. CPI = 10/5 = 2. 0 n Sequence 2: IC = 6 n n Clock Cycles = 4× 1 + 1× 2 + 1× 3 =9 Avg. CPI = 9/6 = 1. 5
Performance Summary The BIG Picture • Performance depends on • • Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc
More complex pipeline Simpler pipeline Core 2 • In CMOS IC technology Dynamic Power × 30 CMOS primary energy consumption is dynamic energy, switch on->off; off->on controlled by the clock freq. 5 V → 1 V × 1000 § 1. 7 The Power Wall Power Trends
Relative Power
Reducing Power • Suppose a new CPU has • 85% of capacitive load of old CPU • 15% voltage and 15% frequency reduction n The power wall n n n We can’t reduce voltage further We can’t remove more heat How to improve overall performance?
Problem
Pitfall: Amdahl’s Law • Improving an aspect of a computer and expecting a proportional improvement in overall performance n Example: multiply accounts for 80 s/100 s n How much improvement in multiply performance to get 5× overall? n n Can’t be done! Corollary: make the common case fast
Concluding Remarks • Cost/performance is improving • Due to underlying technology development • Hierarchical layers of abstraction • In both hardware and software • Instruction set architecture • The hardware/software interface • Execution time: the best performance measure • Power is a limiting factor • Use parallelism to improve performance
Questions? 29
- Computer architecture and organization difference
- Computer organization and architecture 10th solution
- Computer organization and architecture iit kharagpur
- Introduction to computer organization and architecture
- Spec rating formula in computer organization
- Computer organization and architecture 10th edition
- Computer organization and architecture stallings
- Computer organization and architecture definition
- 1s complement
- Computer architecture and organization
- Process organization in computer organization
- Herdaynote arsitektur memori
- Basic performance equation in computer organization
- Buses in computer architecture
- Instruction set architecture in computer organization
- Memory organisation in computer architecture
- Computer architecture performance evaluation methods
- Basic structure of a computer
- Design of basic computer
- Basic computer organization
- Jeopardy adding and subtracting integers
- Katherine is very interested in cryogenics
- Block organization and point by point organization
- Cse 312
- 123 132 213 231 312 321
- Nepotism usps
- Ics 312
- Cse 312
- Geog 312 sfu
- Java 8 312
- 866-556-8166