CS 465 Computer System Architecture Introduction Lecture 1











































































- Slides: 75
CS 465 Computer System Architecture Introduction Lecture 1 Be able to explain the organization of the classical von Neumann machine and its major functional components Be able to compare performance of simple system configurations and understand the performance implications of architectural choices 12/17/2021 CS 465 1
Class Overview • • • Review of syllabus Typical lecture format Intro to Computer Architecture Organization of a computer Pre-requisite exam discussion (5 minutes) Chapters 1, 6. 2, 6. 3, 6. 4 Appendices B, C 12/17/2021 CS 465 2
Typical Lectures (3 hour class) Most Lectures Before break – 75 minutes Connect to recent articles, product ideas (student participation) 10/15 mins Exam/test review 15 mins Lecture Break – 10 minutes After break – 75 minutes Lecture Quiz Record class participation 2 mins 12/17/2021 Lecture + Exam • Lecture before break • Break – 10 minutes • Exam 60 to 90 minutes OR • Exam 60 to 90 minutes • Break – 10 minutes • Lecture after break CS 465 4
What do you expect? • Wiki. Leaks? 12/17/2021 CS 465 5
Why Are You Here? • Learn how computers work • How to optimize your application – Analyze and improve performance – Embedded vs desk top computers – Real time, on-line , and batch applications • CS 465 is REQUIRED !!! 12/17/2021 CS 465 6
The Computer Revolution • Progress in computer technology – Underpinned by Moore’s Law • Makes novel applications feasible – Computers in automobiles – Cell phones – Human genome project – World Wide Web – Search Engines • Computers are pervasive 12/17/2021 CS 465 7
Classes of Computers • Desktop computers – General purpose, variety of software – Subject to cost/performance tradeoff • Server computers – Network based – High capacity, performance, reliability – Range from small servers to building sized • Embedded computers – Hidden as components of systems – Stringent power/performance/cost constraints 12/17/2021 CS 465 8
The Processor Market 12/17/2021 CS 465 9
Understanding Performance • Algorithm – Determines number of operations executed • Programming language, compiler, architecture – Determine number of machine instructions executed per operation • Processor and memory system – Determine how fast instructions are executed • I/O system (including OS) – Determines how fast I/O operations are executed 12/17/2021 CS 465 10
§ 1. 4 Performance Defining Performance • Which airplane has the best performance? 12/17/2021 CS 465 11
Factors that Impact Computer Architecture Programming Languages Technology Applications Computer Architecture Market History 12/17/2021 CS 465 Operating Systems 12
Why Study Computer Architecture? • Rapid change • New challenges – miniaturization, wearable computers, mobility • Its exciting • That’s what computer scientists do as compared to programmers • Helps in making purchasing decisions / give “expert” advice 12/17/2021 CS 465 13
Motivating Question • How would you specify a computer system? 12/17/2021 CS 465 14
Components of a Computer The BIG Picture • Same components for all kinds of computer – Desktop, server, embedded • Input/output includes – User-interface devices • Display, keyboard, mouse – Storage devices • Hard disk, CD/DVD, flash – Network adapters • For communicating with other computers 12/17/2021 CS 465 15
Anatomy of a Computer Output device Network cable Input device 12/17/2021 CS 465 16
Anatomy of a Mouse • Optical mouse – LED illuminates desktop – Small low-res camera – Basic image processor • Looks for x, y movement – Buttons & wheel • Supersedes roller-ball mechanical mouse 12/17/2021 CS 465 17
Through the Looking Glass • LCD screen: picture elements (pixels) – Mirrors content of frame buffer memory 12/17/2021 CS 465 18
Opening the Box 12/17/2021 CS 465 19
Inside the Processor (CPU) • Datapath: performs operations on data • Control: sequences datapath, memory, . . . • Cache memory – Small fast SRAM memory for immediate access to data 12/17/2021 CS 465 20
Inside the Processor • AMD Barcelona: 4 processor cores 12/17/2021 CS 465 21
Abstractions The BIG Picture • Abstraction helps us deal with complexity – Hide lower-level detail • Instruction set architecture (ISA) – The hardware/software interface • Application binary interface – The ISA plus system software interface • Implementation – The details underlying and interface 12/17/2021 CS 465 22
A Safe Place for Data • Volatile main memory – Loses instructions and data when power off • Non-volatile secondary memory – Magnetic disk – Flash memory – Optical disk (CDROM, DVD) 12/17/2021 CS 465 23
Networks • Communication and resource sharing • Local area network (LAN): Ethernet – Within a building • Wide area network (WAN: the Internet • Wireless network: Wi. Fi, Bluetooth 12/17/2021 CS 465 24
Technology Trends • Electronics technology continues to evolve – Increased capacity and performance – Reduced cost Year Technology 1951 Vacuum tube 1965 Transistor 1975 Integrated circuit (IC) 1995 Very large scale IC (VLSI) 2005 Ultra large scale IC 12/17/2021 DRAM capacity Relative performance/cost 1 35 900 2, 400, 000 6, 200, 000 CS 465 25
CPU • Clock frequency – Period = 1/frequency – Typical values – several MHz, few GHz – Frequency vs power • Word size – Address word (a): 16 to 64 bits • Max memory = 2 a bytes/word – Data word: 16 to 64 bits • Typical operation in an instruction 12/17/2021 CS 465 26
CPU - Continued • Millions of transistors on chip • Usually includes – – Registers Datapaths Control Internal cache • Data, instruction • Instruction set • CS 465 focus: Instruction set architecture, datapath and control 12/17/2021 CS 465 27
Memory - RAM • Volatile • Random access – Same time to access all memory locations • DRAM: dynamic, requires refresh, smaller footprint: – several Mb per chip, increases ~60% per year – 10 – 50 ns access time, increases ~10% per year • SRAM: static, no refresh, bigger footprint: several Mb per chip, 2 – 15 ns access time 12/17/2021 CS 465 28
Memory - ROM • • Non-volatile ROM – large volume; look up table PROM – programmable EPROM – erase with UV exposure – Development process • EEROM – change specific mem location – POS 12/17/2021 CS 465 29
Question • If the fastest (cache) memory access time is 5 ns then the CPU clock should be – 1 ns – 3 ns – 6 ns – 9 ns 12/17/2021 CS 465 30
Secondary Memory – Disk • Magnetic – Hard drive • Several GB • Access time = seek time + average rotational latency – Seek time: 5 to 20 ms – Avg rot latency: 3600 to 9600 RPM • Data transfer rate – several MB per second – Floppy drive: 100 ms, 1. 44 MB 12/17/2021 CS 465 31
I/O Example: Disk Drives Cylinders • To access data: — seek: position head over the proper track (8 to 20 ms. avg. ) — rotational latency: wait for desired sector (. 5 / RPM) — transfer: grab the data (one or more sectors) 2 to 15 MB/sec Ó 1998 Morgan Kaufmann Publishers 12/17/2021 CS 465 32
Disk Read Time • Seek time + Average rotational latency + Data transfer time + Controller time • Avg. Rot. Latency Ø 5400 rpm 0. 5 rotation = = 5. 6 ms 5400 rpm / (60 sec/min) • Data transfer time Ø 512 byte sector, 10 MB/sec = 0. 05 ms 12/17/2021 CS 465 33
Secondary Memory - Tape • Sequential, off-line storage, disaster recovery • Density: thousands of bits per inch • Speed: tens of inches per second • Magnetic - several GB per tape • Optical – several TB per tape 12/17/2021 CS 465 34
Input, Output devices • • Input devices Keyboard – 0. 01 KB/s Mouse – 0. 02 KB/s Scanner Voice Output devices • Printer • Monitor – CRT – LCD • Voice Devices are very slow as compared to CPU 12/17/2021 CS 465 35
Network devices • Modem: start, stop bits, character synchronization, 56 Kbps • LAN: Ethernet, 10/100 Mbps, Collision Detect Multiple Access (CDMA), back off, hubs • WAN: Interconnected systems, routers 12/17/2021 CS 465 36
$100 Laptop? 12/17/2021 CS 465 37
Response Time and Throughput • Response time – How long it takes to do a task • Throughput – Total work done per unit time • e. g. , tasks/transactions/… per hour • How are response time and throughput affected by – Replacing the processor with a faster version? – Adding more processors? • We’ll focus on response time for now… 12/17/2021 CS 465 38
Relative Performance • Define Performance = 1/Execution Time • “X is n time faster than Y” • Example: time taken to run a program – 10 s on A, 15 s on B – Execution Time. B / Execution Time. A = 15 s / 10 s = 1. 5 – So A is 1. 5 times faster than B 12/17/2021 CS 465 39
Measuring Execution Time • Elapsed time – Total response time, including all aspects • Processing, I/O, OS overhead, idle time – Determines system performance • CPU time – Time spent processing a given job • Discounts I/O time, other jobs’ shares – Comprises user CPU time and system CPU time – Different programs are affected differently by CPU and system performance 12/17/2021 CS 465 40
SPEC CPU Benchmark • Programs used to measure performance – Supposedly typical of actual workload • Standard Performance Evaluation Corp (SPEC) – Develops benchmarks for CPU, I/O, Web, … • SPEC CPU 2006 – Elapsed time to execute a selection of programs • Negligible I/O, so focuses on CPU performance – Normalize relative to reference machine 12/17/2021 CS 465 41
Propagation delay • What is the propagation delay for a 200 Mhz signal traveling 100 meters? • Speed of light is 3 x 108 meters per sec. 12/17/2021 CS 465 42
Computer System Performance 1 • Consider a computer system that executes a program in 100 s – 90 s of CPU and 10 s of I/O. • If there is a second program executing in 50 s – 40 s of CPU and 10 s of I/O. Total time = 150 s or can we do better? 90 10 40 40 12/17/2021 CS 465 10 10 Total = 150 Total = 140 43
Computer System Performance 2 • Consider a computer system that executes a program in 100 s – 90 s of CPU and 10 s of I/O. • The CPU performance improves each year, and assume that the CPU time reduces 33% each year. • I/O time reduces by 10% per year. • How much faster will the program run in 5 years? 12/17/2021 CS 465 44
Computer System Performance 3 12/17/2021 CS 465 45
CPU Clocking • Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state • Clock period: duration of a clock cycle – e. g. , 250 ps = 0. 25 ns = 250× 10– 12 s • Clock frequency (rate): cycles per second – e. g. , 4. 0 GHz = 4000 MHz = 4. 0× 109 Hz 12/17/2021 CS 465 46
CPU Time • Performance improved by – Reducing number of clock cycles – Increasing clock rate – Hardware designer must often trade off clock rate against cycle count 12/17/2021 CS 465 47
CPU Time Example • Computer A: 2 GHz clock, 10 s CPU time • Designing Computer B – Aim for 6 s CPU time – Can do faster clock, but causes 1. 2 × clock cycles • How fast must Computer B clock be? 12/17/2021 CS 465 48
Instruction Count and CPI • Instruction Count for a program – Determined by program, ISA and compiler • Average cycles per instruction – Determined by CPU hardware – If different instructions have different CPI • Average CPI affected by instruction mix 12/17/2021 CS 465 49
CPI Example • • Computer A: Cycle Time = 250 ps, CPI = 2. 0 Computer B: Cycle Time = 500 ps, CPI = 1. 2 Same ISA Which is faster, and by how much? A is faster… …by this much 12/17/2021 CS 465 50
CPI in More Detail • If different instruction classes take different numbers of cycles • Weighted average CPI Relative frequency 12/17/2021 CS 465 51
CPI Example • Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 IC in sequence 2 4 1 1 • Sequence 1: IC = 5 • Sequence 2: IC = 6 – Clock Cycles = 2× 1 + 1× 2 + 2× 3 = 10 – Avg. CPI = 10/5 = 2. 0 12/17/2021 – Clock Cycles = 4× 1 + 1× 2 + 1× 3 =9 – Avg. CPI = 9/6 = 1. 5 CS 465 52
Performance Summary The BIG Picture • Performance depends on – Algorithm: affects IC, possibly CPI – Programming language: affects IC, CPI – Compiler: affects IC, CPI – Instruction set architecture: affects IC, CPI, Tc 12/17/2021 CS 465 53
§ 1. 5 The Power Wall Power Trends • In CMOS IC technology × 30 12/17/2021 5 V → 1 V CS 465 × 1000 54
Reducing Power • Suppose a new CPU has – 85% of capacitive load of old CPU – 15% voltage and 15% frequency reduction • The power wall – We can’t reduce voltage further – We can’t remove more heat • How else can we improve performance? 12/17/2021 CS 465 55
§ 1. 6 The Sea Change: The Switch to Multiprocessors Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency 12/17/2021 CS 465 56
Multiprocessors • Multicore microprocessors – More than one processor per chip • Requires explicitly parallel programming – Compare with instruction level parallelism • Hardware executes multiple instructions at once • Hidden from the programmer – Hard to do • Programming for performance • Load balancing • Optimizing communication and synchronization 12/17/2021 CS 465 57
What is Computer Architecture ? • Computer Architecture = Instruction Set Architecture + Machine Organization 12/17/2021 CS 465 58
Instruction Set Architecture Definition • Organization of programmable storage • Data types and structures – Encoding and representation • Instruction set • Instruction format • Addressing modes, accessing data and instructions • Exception conditions 12/17/2021 CS 465 59
Instruction Set – Software/Hardware Interface Software I N S T R U C T I O N Hardware S E T 12/17/2021 CS 465 60
MIPS R 3000 Instruction Set Architecture (Summary) Registers • Instruction Categories – Load/Store – Data Manipulation • Floating point – Program Manipulation • Branch & Jump – Special R 0 - R 31 PC HI LO 3 Instruction Formats: all 32 bits wide OP rs rt OP 12/17/2021 rd sa funct address/immediate jump target CS 465 Q: How many already familiar with MIPS ISA? Arithmetic Branch, imm. Jump 61 Copyright 1997 UCB
Examples of ISAs • • IBM 360/370 Motorola Power. PC DEC VAX, Alpha HP PA-RISC Sun Sparc SGI MIPS Intel 80 X 6, Pentium, MMX, Pentium 4 12/17/2021 CS 465 62
Abstraction – Hides Details • Chip with millions of devices • Software with millions of lines of codes (instructions) • We need tools to handle this complexity Hide unnecessary details Communication between layers • Use of abstraction is essential for complex system design. • Abstraction used in software and hardware design. • Each layer reveals more detail. 12/17/2021 CS 465 temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High-level language program Compiler lw lw sw sw Assembly language program $15, 0($2) $16, 4($2) $16, 0($2) $15, 4($2) Assembler Binary machine language 0000 1010 1100 0101 1001 1111 0110 1000 1100 0101 1010 0000 0110 1000 1111 1001 1010 0000 0101 1100 1111 1000 0110 0101 1100 0000 1010 Machine Interpretation Control Signal Specification ALUOP[0: 3] <= Inst. Reg[9: 11] & MASK Adapted from Patterson Copyright 1997 UCB 63 1000 0110 1001 1111
Machine Organization • Functional Units – capabilities and performance • Registers, ALU • Interconnection of the FUs - datapaths • Information flow between FUs • Control architecture – Logic to meet instruction set requirements • Register Transfer Level description 12/17/2021 CS 465 64
Levels of Organization Computer Processor Workstation Design Target: 25% of cost on Processor 25% of cost on Memory (minimum memory size) Rest on I/O devices, power supplies, box Devices Control Input Datapath Cache Data, Instr 12/17/2021 Memory CS 465 Output Adapted from Patterson Copyright 1997 UCB 65
Von Neumman Architecture Fetch Execute Decode From Wikipedia 12/17/2021 CS 465 66
Instruction Execution Cycle Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction 12/17/2021 Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand data Compute result value or status Deposit results in storage for later use Determine successor instruction CS 465 Copyright 1997 UCB 67
Processor and Caches SPARCstation 20 MBus Module Processor MBus Slot 1 MBus Slot 0 Registers Datapath Internal Cache Control External Cache 12/17/2021 CS 465 68 Copyright 1997 UCB
Memory SIMM Slot 7 SIMM Slot 6 SIMM Slot 5 SIMM Slot 4 SIMM Slot 3 SIMM Slot 2 SIMM Slot 1 SIMM Slot 0 Memory Controller SPARCstation 20 Memory SIMM Bus DRAM SIMM 12/17/2021 DRAM DRAM DRAM CS 465 Copyright 1997 UCB 69
Input and Output (I/O) Devices • SCSI Bus: Standard I/O Devices • High Speed I/O Devices • External Bus: Low Speed I/O Device 12/17/2021 CS 465 70
Standard I/O Devices • SCSI = Small Computer Systems Interface • A standard interface (IBM, Apple, HP, Sun. . . etc. ) • Computers and I/O devices communicate with each other • The hard disk is one I/O device resides on the SCSI Bus 12/17/2021 CS 465 Adapted from Patterson Copyright 1997 UCB Disk Tape SCSI Bus 71
What is Computer Architecture? User Application Operating System Compiler Firmware Instruction Set Instr. Set Proc. I/O system Datapath & Control Digital Design Circuit Design Layout Transistors / Semiconductor Electrons / Holes 12/17/2021 CS 465 Copyright 1997 UCB 72
Pitfall: Amdahl’s Law • Improving an aspect of a computer and expecting a proportional improvement in overall performance • Example: multiply accounts for 80 s/100 s – How much improvement in multiply performance to get 5× overall? – Can’t be done! • Corollary: make the common case fast 12/17/2021 CS 465 73
Fallacy: Low Power at Idle • Look back at X 4 power benchmark – At 100% load: 295 W – At 50% load: 246 W (83%) – At 10% load: 180 W (61%) • Google data center – Mostly operates at 10% – 50% load – At 100% load less than 1% of the time • Consider designing processors to make power proportional to load 12/17/2021 CS 465 74
Pitfall: MIPS as a Performance Metric • MIPS: Millions of Instructions Per Second – Doesn’t account for • Differences in ISAs between computers • Differences in complexity between instructions – CPI varies between programs on a given CPU 12/17/2021 CS 465 75
Concluding Remarks • Cost/performance is improving – Due to underlying technology development • Hierarchical layers of abstraction – In both hardware and software • Instruction set architecture – The hardware/software interface • Execution time: the best performance measure • Power is a limiting factor – Use parallelism to improve performance 12/17/2021 CS 465 76