CS 465 Computer System Architecture Introduction Lecture 1

  • Slides: 75
Download presentation
CS 465 Computer System Architecture Introduction Lecture 1 Be able to explain the organization

CS 465 Computer System Architecture Introduction Lecture 1 Be able to explain the organization of the classical von Neumann machine and its major functional components Be able to compare performance of simple system configurations and understand the performance implications of architectural choices 12/17/2021 CS 465 1

Class Overview • • • Review of syllabus Typical lecture format Intro to Computer

Class Overview • • • Review of syllabus Typical lecture format Intro to Computer Architecture Organization of a computer Pre-requisite exam discussion (5 minutes) Chapters 1, 6. 2, 6. 3, 6. 4 Appendices B, C 12/17/2021 CS 465 2

Typical Lectures (3 hour class) Most Lectures Before break – 75 minutes Connect to

Typical Lectures (3 hour class) Most Lectures Before break – 75 minutes Connect to recent articles, product ideas (student participation) 10/15 mins Exam/test review 15 mins Lecture Break – 10 minutes After break – 75 minutes Lecture Quiz Record class participation 2 mins 12/17/2021 Lecture + Exam • Lecture before break • Break – 10 minutes • Exam 60 to 90 minutes OR • Exam 60 to 90 minutes • Break – 10 minutes • Lecture after break CS 465 4

What do you expect? • Wiki. Leaks? 12/17/2021 CS 465 5

What do you expect? • Wiki. Leaks? 12/17/2021 CS 465 5

Why Are You Here? • Learn how computers work • How to optimize your

Why Are You Here? • Learn how computers work • How to optimize your application – Analyze and improve performance – Embedded vs desk top computers – Real time, on-line , and batch applications • CS 465 is REQUIRED !!! 12/17/2021 CS 465 6

The Computer Revolution • Progress in computer technology – Underpinned by Moore’s Law •

The Computer Revolution • Progress in computer technology – Underpinned by Moore’s Law • Makes novel applications feasible – Computers in automobiles – Cell phones – Human genome project – World Wide Web – Search Engines • Computers are pervasive 12/17/2021 CS 465 7

Classes of Computers • Desktop computers – General purpose, variety of software – Subject

Classes of Computers • Desktop computers – General purpose, variety of software – Subject to cost/performance tradeoff • Server computers – Network based – High capacity, performance, reliability – Range from small servers to building sized • Embedded computers – Hidden as components of systems – Stringent power/performance/cost constraints 12/17/2021 CS 465 8

The Processor Market 12/17/2021 CS 465 9

The Processor Market 12/17/2021 CS 465 9

Understanding Performance • Algorithm – Determines number of operations executed • Programming language, compiler,

Understanding Performance • Algorithm – Determines number of operations executed • Programming language, compiler, architecture – Determine number of machine instructions executed per operation • Processor and memory system – Determine how fast instructions are executed • I/O system (including OS) – Determines how fast I/O operations are executed 12/17/2021 CS 465 10

§ 1. 4 Performance Defining Performance • Which airplane has the best performance? 12/17/2021

§ 1. 4 Performance Defining Performance • Which airplane has the best performance? 12/17/2021 CS 465 11

Factors that Impact Computer Architecture Programming Languages Technology Applications Computer Architecture Market History 12/17/2021

Factors that Impact Computer Architecture Programming Languages Technology Applications Computer Architecture Market History 12/17/2021 CS 465 Operating Systems 12

Why Study Computer Architecture? • Rapid change • New challenges – miniaturization, wearable computers,

Why Study Computer Architecture? • Rapid change • New challenges – miniaturization, wearable computers, mobility • Its exciting • That’s what computer scientists do as compared to programmers • Helps in making purchasing decisions / give “expert” advice 12/17/2021 CS 465 13

Motivating Question • How would you specify a computer system? 12/17/2021 CS 465 14

Motivating Question • How would you specify a computer system? 12/17/2021 CS 465 14

Components of a Computer The BIG Picture • Same components for all kinds of

Components of a Computer The BIG Picture • Same components for all kinds of computer – Desktop, server, embedded • Input/output includes – User-interface devices • Display, keyboard, mouse – Storage devices • Hard disk, CD/DVD, flash – Network adapters • For communicating with other computers 12/17/2021 CS 465 15

Anatomy of a Computer Output device Network cable Input device 12/17/2021 CS 465 16

Anatomy of a Computer Output device Network cable Input device 12/17/2021 CS 465 16

Anatomy of a Mouse • Optical mouse – LED illuminates desktop – Small low-res

Anatomy of a Mouse • Optical mouse – LED illuminates desktop – Small low-res camera – Basic image processor • Looks for x, y movement – Buttons & wheel • Supersedes roller-ball mechanical mouse 12/17/2021 CS 465 17

Through the Looking Glass • LCD screen: picture elements (pixels) – Mirrors content of

Through the Looking Glass • LCD screen: picture elements (pixels) – Mirrors content of frame buffer memory 12/17/2021 CS 465 18

Opening the Box 12/17/2021 CS 465 19

Opening the Box 12/17/2021 CS 465 19

Inside the Processor (CPU) • Datapath: performs operations on data • Control: sequences datapath,

Inside the Processor (CPU) • Datapath: performs operations on data • Control: sequences datapath, memory, . . . • Cache memory – Small fast SRAM memory for immediate access to data 12/17/2021 CS 465 20

Inside the Processor • AMD Barcelona: 4 processor cores 12/17/2021 CS 465 21

Inside the Processor • AMD Barcelona: 4 processor cores 12/17/2021 CS 465 21

Abstractions The BIG Picture • Abstraction helps us deal with complexity – Hide lower-level

Abstractions The BIG Picture • Abstraction helps us deal with complexity – Hide lower-level detail • Instruction set architecture (ISA) – The hardware/software interface • Application binary interface – The ISA plus system software interface • Implementation – The details underlying and interface 12/17/2021 CS 465 22

A Safe Place for Data • Volatile main memory – Loses instructions and data

A Safe Place for Data • Volatile main memory – Loses instructions and data when power off • Non-volatile secondary memory – Magnetic disk – Flash memory – Optical disk (CDROM, DVD) 12/17/2021 CS 465 23

Networks • Communication and resource sharing • Local area network (LAN): Ethernet – Within

Networks • Communication and resource sharing • Local area network (LAN): Ethernet – Within a building • Wide area network (WAN: the Internet • Wireless network: Wi. Fi, Bluetooth 12/17/2021 CS 465 24

Technology Trends • Electronics technology continues to evolve – Increased capacity and performance –

Technology Trends • Electronics technology continues to evolve – Increased capacity and performance – Reduced cost Year Technology 1951 Vacuum tube 1965 Transistor 1975 Integrated circuit (IC) 1995 Very large scale IC (VLSI) 2005 Ultra large scale IC 12/17/2021 DRAM capacity Relative performance/cost 1 35 900 2, 400, 000 6, 200, 000 CS 465 25

CPU • Clock frequency – Period = 1/frequency – Typical values – several MHz,

CPU • Clock frequency – Period = 1/frequency – Typical values – several MHz, few GHz – Frequency vs power • Word size – Address word (a): 16 to 64 bits • Max memory = 2 a bytes/word – Data word: 16 to 64 bits • Typical operation in an instruction 12/17/2021 CS 465 26

CPU - Continued • Millions of transistors on chip • Usually includes – –

CPU - Continued • Millions of transistors on chip • Usually includes – – Registers Datapaths Control Internal cache • Data, instruction • Instruction set • CS 465 focus: Instruction set architecture, datapath and control 12/17/2021 CS 465 27

Memory - RAM • Volatile • Random access – Same time to access all

Memory - RAM • Volatile • Random access – Same time to access all memory locations • DRAM: dynamic, requires refresh, smaller footprint: – several Mb per chip, increases ~60% per year – 10 – 50 ns access time, increases ~10% per year • SRAM: static, no refresh, bigger footprint: several Mb per chip, 2 – 15 ns access time 12/17/2021 CS 465 28

Memory - ROM • • Non-volatile ROM – large volume; look up table PROM

Memory - ROM • • Non-volatile ROM – large volume; look up table PROM – programmable EPROM – erase with UV exposure – Development process • EEROM – change specific mem location – POS 12/17/2021 CS 465 29

Question • If the fastest (cache) memory access time is 5 ns then the

Question • If the fastest (cache) memory access time is 5 ns then the CPU clock should be – 1 ns – 3 ns – 6 ns – 9 ns 12/17/2021 CS 465 30

Secondary Memory – Disk • Magnetic – Hard drive • Several GB • Access

Secondary Memory – Disk • Magnetic – Hard drive • Several GB • Access time = seek time + average rotational latency – Seek time: 5 to 20 ms – Avg rot latency: 3600 to 9600 RPM • Data transfer rate – several MB per second – Floppy drive: 100 ms, 1. 44 MB 12/17/2021 CS 465 31

I/O Example: Disk Drives Cylinders • To access data: — seek: position head over

I/O Example: Disk Drives Cylinders • To access data: — seek: position head over the proper track (8 to 20 ms. avg. ) — rotational latency: wait for desired sector (. 5 / RPM) — transfer: grab the data (one or more sectors) 2 to 15 MB/sec Ó 1998 Morgan Kaufmann Publishers 12/17/2021 CS 465 32

Disk Read Time • Seek time + Average rotational latency + Data transfer time

Disk Read Time • Seek time + Average rotational latency + Data transfer time + Controller time • Avg. Rot. Latency Ø 5400 rpm 0. 5 rotation = = 5. 6 ms 5400 rpm / (60 sec/min) • Data transfer time Ø 512 byte sector, 10 MB/sec = 0. 05 ms 12/17/2021 CS 465 33

Secondary Memory - Tape • Sequential, off-line storage, disaster recovery • Density: thousands of

Secondary Memory - Tape • Sequential, off-line storage, disaster recovery • Density: thousands of bits per inch • Speed: tens of inches per second • Magnetic - several GB per tape • Optical – several TB per tape 12/17/2021 CS 465 34

Input, Output devices • • Input devices Keyboard – 0. 01 KB/s Mouse –

Input, Output devices • • Input devices Keyboard – 0. 01 KB/s Mouse – 0. 02 KB/s Scanner Voice Output devices • Printer • Monitor – CRT – LCD • Voice Devices are very slow as compared to CPU 12/17/2021 CS 465 35

Network devices • Modem: start, stop bits, character synchronization, 56 Kbps • LAN: Ethernet,

Network devices • Modem: start, stop bits, character synchronization, 56 Kbps • LAN: Ethernet, 10/100 Mbps, Collision Detect Multiple Access (CDMA), back off, hubs • WAN: Interconnected systems, routers 12/17/2021 CS 465 36

$100 Laptop? 12/17/2021 CS 465 37

$100 Laptop? 12/17/2021 CS 465 37

Response Time and Throughput • Response time – How long it takes to do

Response Time and Throughput • Response time – How long it takes to do a task • Throughput – Total work done per unit time • e. g. , tasks/transactions/… per hour • How are response time and throughput affected by – Replacing the processor with a faster version? – Adding more processors? • We’ll focus on response time for now… 12/17/2021 CS 465 38

Relative Performance • Define Performance = 1/Execution Time • “X is n time faster

Relative Performance • Define Performance = 1/Execution Time • “X is n time faster than Y” • Example: time taken to run a program – 10 s on A, 15 s on B – Execution Time. B / Execution Time. A = 15 s / 10 s = 1. 5 – So A is 1. 5 times faster than B 12/17/2021 CS 465 39

Measuring Execution Time • Elapsed time – Total response time, including all aspects •

Measuring Execution Time • Elapsed time – Total response time, including all aspects • Processing, I/O, OS overhead, idle time – Determines system performance • CPU time – Time spent processing a given job • Discounts I/O time, other jobs’ shares – Comprises user CPU time and system CPU time – Different programs are affected differently by CPU and system performance 12/17/2021 CS 465 40

SPEC CPU Benchmark • Programs used to measure performance – Supposedly typical of actual

SPEC CPU Benchmark • Programs used to measure performance – Supposedly typical of actual workload • Standard Performance Evaluation Corp (SPEC) – Develops benchmarks for CPU, I/O, Web, … • SPEC CPU 2006 – Elapsed time to execute a selection of programs • Negligible I/O, so focuses on CPU performance – Normalize relative to reference machine 12/17/2021 CS 465 41

Propagation delay • What is the propagation delay for a 200 Mhz signal traveling

Propagation delay • What is the propagation delay for a 200 Mhz signal traveling 100 meters? • Speed of light is 3 x 108 meters per sec. 12/17/2021 CS 465 42

Computer System Performance 1 • Consider a computer system that executes a program in

Computer System Performance 1 • Consider a computer system that executes a program in 100 s – 90 s of CPU and 10 s of I/O. • If there is a second program executing in 50 s – 40 s of CPU and 10 s of I/O. Total time = 150 s or can we do better? 90 10 40 40 12/17/2021 CS 465 10 10 Total = 150 Total = 140 43

Computer System Performance 2 • Consider a computer system that executes a program in

Computer System Performance 2 • Consider a computer system that executes a program in 100 s – 90 s of CPU and 10 s of I/O. • The CPU performance improves each year, and assume that the CPU time reduces 33% each year. • I/O time reduces by 10% per year. • How much faster will the program run in 5 years? 12/17/2021 CS 465 44

Computer System Performance 3 12/17/2021 CS 465 45

Computer System Performance 3 12/17/2021 CS 465 45

CPU Clocking • Operation of digital hardware governed by a constant-rate clock Clock period

CPU Clocking • Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state • Clock period: duration of a clock cycle – e. g. , 250 ps = 0. 25 ns = 250× 10– 12 s • Clock frequency (rate): cycles per second – e. g. , 4. 0 GHz = 4000 MHz = 4. 0× 109 Hz 12/17/2021 CS 465 46

CPU Time • Performance improved by – Reducing number of clock cycles – Increasing

CPU Time • Performance improved by – Reducing number of clock cycles – Increasing clock rate – Hardware designer must often trade off clock rate against cycle count 12/17/2021 CS 465 47

CPU Time Example • Computer A: 2 GHz clock, 10 s CPU time •

CPU Time Example • Computer A: 2 GHz clock, 10 s CPU time • Designing Computer B – Aim for 6 s CPU time – Can do faster clock, but causes 1. 2 × clock cycles • How fast must Computer B clock be? 12/17/2021 CS 465 48

Instruction Count and CPI • Instruction Count for a program – Determined by program,

Instruction Count and CPI • Instruction Count for a program – Determined by program, ISA and compiler • Average cycles per instruction – Determined by CPU hardware – If different instructions have different CPI • Average CPI affected by instruction mix 12/17/2021 CS 465 49

CPI Example • • Computer A: Cycle Time = 250 ps, CPI = 2.

CPI Example • • Computer A: Cycle Time = 250 ps, CPI = 2. 0 Computer B: Cycle Time = 500 ps, CPI = 1. 2 Same ISA Which is faster, and by how much? A is faster… …by this much 12/17/2021 CS 465 50

CPI in More Detail • If different instruction classes take different numbers of cycles

CPI in More Detail • If different instruction classes take different numbers of cycles • Weighted average CPI Relative frequency 12/17/2021 CS 465 51

CPI Example • Alternative compiled code sequences using instructions in classes A, B, C

CPI Example • Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 IC in sequence 2 4 1 1 • Sequence 1: IC = 5 • Sequence 2: IC = 6 – Clock Cycles = 2× 1 + 1× 2 + 2× 3 = 10 – Avg. CPI = 10/5 = 2. 0 12/17/2021 – Clock Cycles = 4× 1 + 1× 2 + 1× 3 =9 – Avg. CPI = 9/6 = 1. 5 CS 465 52

Performance Summary The BIG Picture • Performance depends on – Algorithm: affects IC, possibly

Performance Summary The BIG Picture • Performance depends on – Algorithm: affects IC, possibly CPI – Programming language: affects IC, CPI – Compiler: affects IC, CPI – Instruction set architecture: affects IC, CPI, Tc 12/17/2021 CS 465 53

§ 1. 5 The Power Wall Power Trends • In CMOS IC technology ×

§ 1. 5 The Power Wall Power Trends • In CMOS IC technology × 30 12/17/2021 5 V → 1 V CS 465 × 1000 54

Reducing Power • Suppose a new CPU has – 85% of capacitive load of

Reducing Power • Suppose a new CPU has – 85% of capacitive load of old CPU – 15% voltage and 15% frequency reduction • The power wall – We can’t reduce voltage further – We can’t remove more heat • How else can we improve performance? 12/17/2021 CS 465 55

§ 1. 6 The Sea Change: The Switch to Multiprocessors Uniprocessor Performance Constrained by

§ 1. 6 The Sea Change: The Switch to Multiprocessors Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency 12/17/2021 CS 465 56

Multiprocessors • Multicore microprocessors – More than one processor per chip • Requires explicitly

Multiprocessors • Multicore microprocessors – More than one processor per chip • Requires explicitly parallel programming – Compare with instruction level parallelism • Hardware executes multiple instructions at once • Hidden from the programmer – Hard to do • Programming for performance • Load balancing • Optimizing communication and synchronization 12/17/2021 CS 465 57

What is Computer Architecture ? • Computer Architecture = Instruction Set Architecture + Machine

What is Computer Architecture ? • Computer Architecture = Instruction Set Architecture + Machine Organization 12/17/2021 CS 465 58

Instruction Set Architecture Definition • Organization of programmable storage • Data types and structures

Instruction Set Architecture Definition • Organization of programmable storage • Data types and structures – Encoding and representation • Instruction set • Instruction format • Addressing modes, accessing data and instructions • Exception conditions 12/17/2021 CS 465 59

Instruction Set – Software/Hardware Interface Software I N S T R U C T

Instruction Set – Software/Hardware Interface Software I N S T R U C T I O N Hardware S E T 12/17/2021 CS 465 60

MIPS R 3000 Instruction Set Architecture (Summary) Registers • Instruction Categories – Load/Store –

MIPS R 3000 Instruction Set Architecture (Summary) Registers • Instruction Categories – Load/Store – Data Manipulation • Floating point – Program Manipulation • Branch & Jump – Special R 0 - R 31 PC HI LO 3 Instruction Formats: all 32 bits wide OP rs rt OP 12/17/2021 rd sa funct address/immediate jump target CS 465 Q: How many already familiar with MIPS ISA? Arithmetic Branch, imm. Jump 61 Copyright 1997 UCB

Examples of ISAs • • IBM 360/370 Motorola Power. PC DEC VAX, Alpha HP

Examples of ISAs • • IBM 360/370 Motorola Power. PC DEC VAX, Alpha HP PA-RISC Sun Sparc SGI MIPS Intel 80 X 6, Pentium, MMX, Pentium 4 12/17/2021 CS 465 62

Abstraction – Hides Details • Chip with millions of devices • Software with millions

Abstraction – Hides Details • Chip with millions of devices • Software with millions of lines of codes (instructions) • We need tools to handle this complexity Hide unnecessary details Communication between layers • Use of abstraction is essential for complex system design. • Abstraction used in software and hardware design. • Each layer reveals more detail. 12/17/2021 CS 465 temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High-level language program Compiler lw lw sw sw Assembly language program $15, 0($2) $16, 4($2) $16, 0($2) $15, 4($2) Assembler Binary machine language 0000 1010 1100 0101 1001 1111 0110 1000 1100 0101 1010 0000 0110 1000 1111 1001 1010 0000 0101 1100 1111 1000 0110 0101 1100 0000 1010 Machine Interpretation Control Signal Specification ALUOP[0: 3] <= Inst. Reg[9: 11] & MASK Adapted from Patterson Copyright 1997 UCB 63 1000 0110 1001 1111

Machine Organization • Functional Units – capabilities and performance • Registers, ALU • Interconnection

Machine Organization • Functional Units – capabilities and performance • Registers, ALU • Interconnection of the FUs - datapaths • Information flow between FUs • Control architecture – Logic to meet instruction set requirements • Register Transfer Level description 12/17/2021 CS 465 64

Levels of Organization Computer Processor Workstation Design Target: 25% of cost on Processor 25%

Levels of Organization Computer Processor Workstation Design Target: 25% of cost on Processor 25% of cost on Memory (minimum memory size) Rest on I/O devices, power supplies, box Devices Control Input Datapath Cache Data, Instr 12/17/2021 Memory CS 465 Output Adapted from Patterson Copyright 1997 UCB 65

Von Neumman Architecture Fetch Execute Decode From Wikipedia 12/17/2021 CS 465 66

Von Neumman Architecture Fetch Execute Decode From Wikipedia 12/17/2021 CS 465 66

Instruction Execution Cycle Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction

Instruction Execution Cycle Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction 12/17/2021 Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand data Compute result value or status Deposit results in storage for later use Determine successor instruction CS 465 Copyright 1997 UCB 67

Processor and Caches SPARCstation 20 MBus Module Processor MBus Slot 1 MBus Slot 0

Processor and Caches SPARCstation 20 MBus Module Processor MBus Slot 1 MBus Slot 0 Registers Datapath Internal Cache Control External Cache 12/17/2021 CS 465 68 Copyright 1997 UCB

Memory SIMM Slot 7 SIMM Slot 6 SIMM Slot 5 SIMM Slot 4 SIMM

Memory SIMM Slot 7 SIMM Slot 6 SIMM Slot 5 SIMM Slot 4 SIMM Slot 3 SIMM Slot 2 SIMM Slot 1 SIMM Slot 0 Memory Controller SPARCstation 20 Memory SIMM Bus DRAM SIMM 12/17/2021 DRAM DRAM DRAM CS 465 Copyright 1997 UCB 69

Input and Output (I/O) Devices • SCSI Bus: Standard I/O Devices • High Speed

Input and Output (I/O) Devices • SCSI Bus: Standard I/O Devices • High Speed I/O Devices • External Bus: Low Speed I/O Device 12/17/2021 CS 465 70

Standard I/O Devices • SCSI = Small Computer Systems Interface • A standard interface

Standard I/O Devices • SCSI = Small Computer Systems Interface • A standard interface (IBM, Apple, HP, Sun. . . etc. ) • Computers and I/O devices communicate with each other • The hard disk is one I/O device resides on the SCSI Bus 12/17/2021 CS 465 Adapted from Patterson Copyright 1997 UCB Disk Tape SCSI Bus 71

What is Computer Architecture? User Application Operating System Compiler Firmware Instruction Set Instr. Set

What is Computer Architecture? User Application Operating System Compiler Firmware Instruction Set Instr. Set Proc. I/O system Datapath & Control Digital Design Circuit Design Layout Transistors / Semiconductor Electrons / Holes 12/17/2021 CS 465 Copyright 1997 UCB 72

Pitfall: Amdahl’s Law • Improving an aspect of a computer and expecting a proportional

Pitfall: Amdahl’s Law • Improving an aspect of a computer and expecting a proportional improvement in overall performance • Example: multiply accounts for 80 s/100 s – How much improvement in multiply performance to get 5× overall? – Can’t be done! • Corollary: make the common case fast 12/17/2021 CS 465 73

Fallacy: Low Power at Idle • Look back at X 4 power benchmark –

Fallacy: Low Power at Idle • Look back at X 4 power benchmark – At 100% load: 295 W – At 50% load: 246 W (83%) – At 10% load: 180 W (61%) • Google data center – Mostly operates at 10% – 50% load – At 100% load less than 1% of the time • Consider designing processors to make power proportional to load 12/17/2021 CS 465 74

Pitfall: MIPS as a Performance Metric • MIPS: Millions of Instructions Per Second –

Pitfall: MIPS as a Performance Metric • MIPS: Millions of Instructions Per Second – Doesn’t account for • Differences in ISAs between computers • Differences in complexity between instructions – CPI varies between programs on a given CPU 12/17/2021 CS 465 75

Concluding Remarks • Cost/performance is improving – Due to underlying technology development • Hierarchical

Concluding Remarks • Cost/performance is improving – Due to underlying technology development • Hierarchical layers of abstraction – In both hardware and software • Instruction set architecture – The hardware/software interface • Execution time: the best performance measure • Power is a limiting factor – Use parallelism to improve performance 12/17/2021 CS 465 76