ECE 313 Computer Organization Lecture 20 Memory Hierarchy










































- Slides: 42

ECE 313 - Computer Organization Lecture 20 - Memory Hierarchy 1 Feb 2005 Reading: 7. 1 -7. 3 Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette. edu Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD 2 e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18 -347 Slides - Fall 1999 CMU Feb 2005 Lecture 20 - Memory other sources as noted 1

Roadmap for the term: major topics } } } Overview / Abstractions and Technology Instruction sets Logic & arithmetic Performance Processor Implementation } Single-cycle implemenatation } Multicycle implementation } Pipelined Implementation } Memory systems } Input/Output Feb 2005 Lecture 20 - Memory 2

Outline - Memory Systems } Overview } Motivation } General Structure and Terminology } Memory Technology } Static RAM } Dynamic RAM } Disks } Cache Memory } Virtual Memory Feb 2005 Lecture 20 - Memory 3

Memory Systems - the Big Picture } Memory provides processor with } Instructions } Data } Problem: memory is too slow and too small Processor Input Control Memory Datapath Data Instructions Output “Five Classics Components” Picture Feb 2005 Lecture 20 - Memory 4

Memory Hierarchy - the Big Picture } Problem: memory is too slow and too small } Solution: memory hierarchy Processor Control L 1 On-Chip Cache Registers Datapath L 2 Off-Chip Cache Slowest Biggest Lowest Speed: Fastest Size: Smallest Cost: Highest Feb 2005 Main Memory (DRAM) Secondary Storage (Disk) Lecture 20 - Memory 5

Why Hierarchy Works } The principle of locality } Programs access a relatively small portion of the address space at any instant of time. Probability of reference 0 Address Space 2 n - 1 } Temporal locality: recently accessed data is likely to be used again } Spatial locality: data near recently accessed data is likely to be used soon } Result: the illusion of large, fast memory Feb 2005 Lecture 20 - Memory 6

Memory Hierarchy - Speed vs. Size Processor Control Size (bytes): Feb 2005 0. 25 -0. 5 <1 K L 1 On-Chip Cache Speed (ns): Registers Datapath L 2 Off-Chip Cache 0. 5 -25 <16 M Lecture 20 - Memory Main Memory (DRAM) Secondary Storage (Disk) 80 -250 5, 000 (5 ms) <16 G >100 G 7

Memory Hierarchy - Terminology Processor Hit: Data in Upper Level Miss: Data not in Upper Level Blocks of Data Feb 2005 Lecture 20 - Memory 8

Memory Hierarchy Terminology (cont’d) } Hit: data appears in some block in the upper level (green block) } Hit Rate: the fraction of memory accesses that “hit” } Hit Time: time to access the upper level (time to determine hit/miss + access time) } Miss: data must be retrieved from block in lower level (orange block) } Miss Rate = 1 - (Hit Rate) } Miss Penalty: Time to replace block in upper level + Time to deliver data to the processor } Hit Time << Miss Penalty and Hit Rate >> Miss Rate Feb 2005 Lecture 20 - Memory 9

Typical Memory Hierarchy - Details } Registers - Small, fastest on-chip storage } Managed by compiler and run-time system } Cache - Small, fast on-chip storage } Associative lookup - managed by hardware } Memory - Slower, Larger off-chip storage } Limited size <16 Gb - managed by hardware, OS } Disk - Slowest, Largest off-chip storage } Virtual memory - simulate a large memory using disk, hardware, and operating system } File storage - store data files using operating system Feb 2005 Lecture 20 - Memory 10

Outline - Memory Systems } Overview } Motivation } General Structure and Terminology } Memory Technology } Static RAM } Dynamic RAM } Cache Memory } Virtual Memory Feb 2005 Lecture 20 - Memory 11

Memory Types word / row select } Static RAM 0 1 } Storage using latch circuits } Values saved while power on 0 1 bit word / row select } Dynamic RAM } Storage using capacitors } Values must be refreshed C bit Feb 2005 Lecture 20 - Memory 12

Tradeoffs - Static vs. Dynamic RAM } Static RAM (SRAM) - used for L 1, L 2 cache } Fast - 0. 5 -25 ns access time (less for on-chip) } Larger, More Expensive } Higher power consumption } Dynamic RAM (DRAM) - used for PC main memory } Slower - 80 -250 ns access time* } Smaller, Cheaper } Lower power consumption Feb 2005 Lecture 20 - Memory 13

DRAM Organization Row Select Line Row Decoder /RAS Row Address Column Address /CAS Feb 2005 Bit (data) Line Column Selector / Latch / IO DATA Lecture 20 - Memory 14

DRAM Read Operation Row Decoder 0 /RAS Row 011 Address 010 0 Column Address /CAS Feb 2005 Column Selector / Latch / IO DATA Lecture 20 - Memory 15

DRAM Trends } RAM size: 4 X every 3 years } RAM speed: 2 X every 10 years DRAM 1980 -1995 Size change: 1000: 1! Feb 2005 Year 1980 1983 1986 Size 64 Kb 256 Kb 1 Mb Cycle Time 250 ns 220 ns 1989 1992 1995 1997? 1999? 4 Mb 16 Mb 64 Mb 128 Mb 256 Mb 165 ns 145 ns 120 ns ? ? ns Lecture 20 - Memory 1980 -1995 Speed change: 2: 1! 16

The Processor/Memory Speed Gap 1000 CPU 100 Processor-Memory Performance Gap: (grows 50% / year) DRAM 9%/yr. (2 X/10 yrs) 10 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Performance “Moore’s Law” Time Feb 2005 Lecture 20 - Memory 17

Addressing the Speed Gap } Latency depends on physical limitations } Bandwidth can be increased using: } Parallelism - transfer more bits / word } Burst transfers - transfer successive words on each cycle } So. . . use bandwidth to support memory hierarchy! } Use cache to support locality of reference } Design hierarchy to transfer large blocks of memory Feb 2005 Lecture 20 - Memory 18

Current DRAM Parts } Synchronous DRAM (SDRAM) - clocked transfer of bursts of data starting at a specific address } Double-Data Rate SDRAM - transfer two bits/clock cycle } Quad-Data Rate SDRAM - transfer four bits / clock cycle } Rambus RDRAM - High-speed interface for fast transfers } Current PCs use some form of SDRAM/RDRAM } SDRAM w/ PC 100 or PC 133 memory bus } RDRAM w/ PC 800 memory bus Feb 2005 Lecture 20 - Memory 19

Memory Configuration in Current PCs Processor L 1 Cache L 2/L 3 Cache (SRAM) System Controller Main Memory (DRAM) (I/O Bus) Feb 2005 Lecture 20 - Memory 20

Outline - Memory Systems } Overview } Motivation } General Structure and Terminology } Memory Technology } Static RAM } Dynamic RAM } Cache Memory } Virtual Memory Feb 2005 Lecture 20 - Memory 21

Cache Operation } Insert between CPU, Main Mem. } Implement with fast Static RAM } Holds some of a program’s } data } instructions CPU addr data Cache Memory addr } Operation: Processor data Hit: Data in Cache (no penalty) Miss: Data not in Cache (miss penalty) Feb 2005 Lecture 20 - Memory DRAM Memory 22

Four Key Cache Questions: 1. Where can block be placed in cache? (block placement) 2. How can block be found in cache? (block identification) 3. Which block should be replaced on a miss? (block replacement) 4. What happens on a write? (write strategy) Feb 2005 Lecture 20 - Memory 23

Basic Cache Design } Organized into blocks or lines } Block Contents } tag - extra bits to identify block (part of block address) } data - data or instruction words contiguous memory locations } Our example: } One-word (4 byte) block size } 30 -bit tag } Two blocks in cache CPU Cache b 0 b 1 tag CPU 0 tag CPU 1 data 0 data 1 Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C 0 x 0000 Feb 2005 Lecture 20 - Memory 24

Cache Example (2) } Assume: } } CPU r 1==0, r 2==1, r 4==2 1 cycle for cache access 5 cycles for main. mem. access 1 cycle for instr. execution } At cycle 1 - PC=0 x 00 Cache M I S S } Fetch instruction from memory • look in cache • MISS - fetch from main mem (5 cycle penalty) b 0 b 1 (empty) CPU (empty) Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 Feb 2005 Lecture 20 - Memory 25

Cache Example (3) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 000 r 1 6 0 x… 0 1 add r 1, r 2 CPU Cache b 0 b 1 } At cycle 6 } Execute instr. add r 1, r 2 (empty) 0 x… 0 CPU L: add r 1, r 2 (empty) CPU (empty) Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 Feb 2005 Lecture 20 - Memory 26

Cache Example (4) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 0 r 1 6 0 x… 0 1 6 -10 add r 1, r 2 CPU FETCH 0 x… 4 Cache M I S S } At cycle 6 - PC=0 x 04 } Fetch instruction from memory • look in cache • MISS - fetch from main mem (5 cycle penalty) b 0 b 1 (empty) 0 x… 0 CPU L: add r 1, r 2 (empty) CPU (empty) Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 Feb 2005 Lecture 20 - Memory 27

Cache Example (5) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 000 r 1 6 0 x… 0 add r 1, r 2 1 0 x… 4 FETCH 0 x… 004 bne r 4, r 1, L 6 -10 11 CPU Cache 1 } At cycle 11 } Execute instr. bne r 4, r 1, L b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 Feb 2005 Lecture 20 - Memory 28

Cache Example (6) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 0 r 1 6 0 x… 0 add r 1, r 2 1 0 x… 4 FETCH 0 x… 4 bne r 4, r 1, L 1 FETCH 0 x… 0 1 6 -10 11 11 CPU Cache H I T } At cycle 11 - PC=0 x 00 } Fetch instruction from memory } HIT - instruction in cache b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 Feb 2005 Lecture 20 - Memory 29

Cache Example (7) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 0 r 1 6 0 x… 0 add r 1, r 2 1 0 x… 4 FETCH 0 x… 4 bne r 4, r 1, L 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 6 -10 11 CPU Cache } At cycle 12 } Execute add r 1, 2 b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 1, r 2, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 Feb 2005 Lecture 20 - Memory 30

Cache Example (8) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 0 r 1 6 0 x… 0 add r 1, r 2 1 0 x… 4 FETCH 0 x… 4 bne r 4, r 1, L 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 12 FETCH 0 x 04 6 -10 11 CPU Cache H I T } At cycle 12 - PC=0 x 04 } Fetch instruction from memory } HIT - instruction in cache b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 Feb 2005 Lecture 20 - Memory 31

Cache Example (9) Cycle 1 -5 6 6 -10 11 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 r 1 FETCH 0 x… 4 bne r 4, r 1, L 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 12 13 FETCH 0 x 04 bne r 4, r 1, L 0 x… 4 Cache } At cycle 13 } Execute instr. bne r 4, r 1, L } Branch not taken Feb 2005 CPU 1 Lecture 20 - Memory b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 32

Cache Example (10) Cycle 1 -5 6 6 -10 11 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 4, r 1, L r 1 CPU 1 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 12 13 13 FETCH 0 x 04 bne r 4, r 1, L FETCH 0 x 08 Cache M I S S } At cycle 13 - PC=0 x 08 } Fetch Instruction from Memory } MISS - not in cache Feb 2005 Lecture 20 - Memory b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 33

Cache Example (11) Cycle 1 -5 6 6 -10 11 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 4, r 1, L r 1 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 12 13 13 -17 FETCH 0 x 04 bne r 4, r 1, L FETCH 0 x 08 } At cycle 17 - PC=0 x 08 } Put instruction into cache } Replace existing instruction Feb 2005 CPU 1 Lecture 20 - Memory Cache b 0 b 1 (empty) 0 x… 0 0 x… 2 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add sub r 1, r 2 r 1, r 1 (empty) � bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 34

Cache Example (12) Cycle 1 -5 6 6 -10 11 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 4, r 1, L r 1 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 12 13 13 -17 18 FETCH 0 x 04 bne r 4, r 1, L FETCH 0 x 08 sub r 1, r 1 2 2 2 0 } At cycle 18 } Execute sub r 1, r 1 Feb 2005 CPU 1 Lecture 20 - Memory Cache b 0 b 1 0 x… 2 (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C sub r 1, r 1 � bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 35

Cache Example (13) Cycle 1 -5 6 6 -10 11 12 12 12 13 13 -17 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 4, r 1, L FETCH 0 x… 0 add r 1, 2 FETCH 0 x 04 bne r 4, r 1, L FETCH 0 x 08 18 18 r 1 CPU 1 1 1 2 2 Cache M I S S sub r 1, r 1 0 FETCH 0 x 0 C } At cycle 18 } Fetch instruction from memory } MISS - not in cache Feb 2005 Lecture 20 - Memory b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add sub r 1, r 2 r 1, r 1 (empty) � bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 36

Cache Example (14) Cycle 1 -5 6 6 -10 11 12 12 12 13 13 -17 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 4, r 1, L FETCH 0 x… 0 add r 1, 2 FETCH 0 x 04 bne r 4, r 1, L FETCH 0 x 08 18 18 -22 r 1 1 1 2 2 sub r 1, r 1 0 FETCH 0 x 0 C } At cycle 22 } Put instruction into cache } Replace existing instruction Feb 2005 CPU 1 Lecture 20 - Memory Cache b 0 b 1 (empty) 0 x… 0 0 x… 2 CPU (empty) 0 x… 1 0 x… 3 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add sub r 1, r 2 r 1, r 1 (empty) � bne j L r 1, r 2, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 37

Cache Example (15) Cycle 1 -5 6 6 -10 11 11 12 12 13 13 -17 18 18 -22 23 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 3, r 1, L FETCH 0 x… 0 0 x… 8 add r 1, 2 FETCH 0 x… 4 bne r 4, r 1, L FETCH 0 x… 8 sub r 1, r 1 FETCH 0 x. . C 0 x… 8 j L r 1 CPU 1 Cache 2 0 b 1 (empty) 0 x… 2 CPU sub r 1, r 1 (empty) 0 x… 3 CPU j(empty) L Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L } At cycle 23 } Execute Feb 2005 0 x 0000 j L Lecture 20 - Memory 38

Compare No-cache vs. Cache NO CACHE Cycle 1 -5 6 6 -10 11 11 -15 16 16 -20 21 21 -25 26 26 -30 31 CACHE Address Op/Instr. FETCH 0 x… 0 add r 1, r FETCH 0 x… 4 bne r 3, r 1, L FETCH 0 x… 0 add r 1, 2 FETCH 0 x… 4 bne r 3, r 1, L FETCH 0 x… 8 sub r 1, r 1 FETCH 0 x. . C 0 x…C j L Feb 2005 M M H H M M Cycle 1 -5 6 6 -10 11 11 12 12 13 13 -17 18 18 -22 23 Lecture 20 - Memory Address Op/Instr. FETCH 0 x… 0 add r 1, r FETCH 0 x… 4 bne r 3, r 1, L FETCH 0 x… 0 add r 1, 2 FETCH 0 x… 4 bne r 3, r 1, L FETCH 0 x… 8 sub r 1, r 1 FETCH 0 x. . C 0 x…C j L 39

Cache Miss and the MIPS Pipeline } Instruction Fetch Compare in Cycle 1 Clock Cycle 1 Feb 2005 Miss Detected in Cycle 2 Fetch Completes (Pipeline Restarts) Clock Clock Cycle 2+N Cycle 3+N Cycle 4+N Cycle 5+N Cycle 6+N Lecture 20 - Memory 40

Cache Miss and the MIPS Pipeline } Load Instruction Compare in Cycle 4 Clock Cycle 1 Feb 2005 Clock Cycle 2 Clock Cycle 3 Miss Detected in Cycle 5 Clock Cycle 4 Clock Cycle 5 Lecture 20 - Memory Load Completes (Pipeline Restarts) Clock Cycle 5+N Clock Cycle 6+N 41

Coming Up: Four Key Cache Questions: 1. Where can block be placed in cache? (block placement) 2. How can block be found in cache? …using a tag (block identification) 3. Which block should be replaced on a miss? (block replacement) 4. What happens on a write? (write strategy) Feb 2005 Lecture 20 - Memory 42