ECE 313 Computer Organization Lecture 20 Memory Hierarchy




































- Slides: 36
ECE 313 - Computer Organization Lecture 20 - Memory Hierarchy 1 Fall 2004 Reading: 7. 1 -7. 3 Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette. edu Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD 2 e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18 -347 Slides - Fall 1999 CMU ECE 313 Fall 2004 Lecture 20 - Memory other sources as noted 1
Outline - Memory Systems } Overview } Motivation } General Structure and Terminology } Memory Technology } Static RAM } Dynamic RAM } Disks } Cache Memory } Virtual Memory ECE 313 Fall 2004 Lecture 20 - Memory 3
Memory Systems - the Big Picture } Memory provides processor with } Instructions } Data } Problem: memory is too slow and too small Processor Input Control Memory Datapath Data Instructions Output “Five Classics Components” Picture ECE 313 Fall 2004 Lecture 20 - Memory 4
Memory Hierarchy - the Big Picture } Problem: memory is too slow and too small } Solution: memory hierarchy Processor Control L 1 On-Chip Cache Registers Datapath L 2 Off-Chip Cache Slowest Biggest Lowest Speed: Fastest Size: Smallest Cost: Highest ECE 313 Fall 2004 Main Memory (DRAM) Secondary Storage (Disk) Lecture 20 - Memory 5
Why Hierarchy Works } The principle of locality } Programs access a relatively small portion of the address space at any instant of time. Probability of reference 0 Address Space 2 n - 1 } Temporal locality: recently accessed data is likely to be used again } Spatial locality: data near recently accessed data is likely to be used soon } Result: the illusion of large, fast memory ECE 313 Fall 2004 Lecture 20 - Memory 6
Memory Hierarchy - Speed vs. Size Processor Control Size (bytes): ECE 313 Fall 2004 0. 25 -0. 5 <1 K L 1 On-Chip Cache Speed (ns): Registers Datapath L 2 Off-Chip Cache 0. 5 -25 <16 M Lecture 20 - Memory Main Memory (DRAM) Secondary Storage (Disk) 80 -250 5, 000 (5 ms) <16 G >100 G 7
Memory Hierarchy - Terminology Processor Hit: Data in Upper Level Miss: Data not in Upper Level Blocks of Data ECE 313 Fall 2004 Lecture 20 - Memory 8
Memory Hierarchy Terminology (cont’d) } Hit: data appears in some block in the upper level (green block) } Hit Rate: the fraction of memory accesses that “hit” } Hit Time: time to access the upper level (time to determine hit/miss + access time) } Miss: data must be retrieved from block in lower level (orange block) } Miss Rate = 1 - (Hit Rate) } Miss Penalty: Time to replace block in upper level + Time to deliver data to the processor } Hit Time << Miss Penalty and Hit Rate >> Miss Rate ECE 313 Fall 2004 Lecture 20 - Memory 9
Typical Memory Hierarchy - Details } Registers - Small, fastest on-chip storage } Managed by compiler and run-time system } Cache - Small, fast on-chip storage } Associative lookup - managed by hardware } Memory - Slower, Larger off-chip storage } Limited size <16 Gb - managed by hardware, OS } Disk - Slowest, Largest off-chip storage } Virtual memory - simulate a large memory using disk, hardware, and operating system } File storage - store data files using operating system ECE 313 Fall 2004 Lecture 20 - Memory 10
Tradeoffs - Static vs. Dynamic RAM } Static RAM (SRAM) - used for L 1, L 2 cache } Fast - 0. 5 -25 ns access time (less for on-chip) } Larger, More Expensive } Higher power consumption } Dynamic RAM (DRAM) - used for PC main memory } Slower - 80 -250 ns access time* } Smaller, Cheaper } Lower power consumption ECE 313 Fall 2004 Lecture 20 - Memory 13
DRAM Trends } RAM size: 4 X every 3 years } RAM speed: 2 X every 10 years DRAM 1980 -1995 Size change: 1000: 1! ECE 313 Fall 2004 Year 1980 1983 1986 Size 64 Kb 256 Kb 1 Mb Cycle Time 250 ns 220 ns 1989 1992 1995 1997? 1999? 4 Mb 16 Mb 64 Mb 128 Mb 256 Mb 165 ns 145 ns 120 ns ? ? ns Lecture 20 - Memory 1980 -1995 Speed change: 2: 1! 16
The Processor/Memory Speed Gap 1000 CPU 100 Processor-Memory Performance Gap: (grows 50% / year) DRAM 9%/yr. (2 X/10 yrs) 10 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Performance “Moore’s Law” Time ECE 313 Fall 2004 Lecture 20 - Memory 17
Addressing the Speed Gap } Latency depends on physical limitations } Bandwidth can be increased using: } Parallelism - transfer more bits / word } Burst transfers - transfer successive words on each cycle } So. . . use bandwidth to support memory hierarchy! } Use cache to support locality of reference } Design hierarchy to transfer large blocks of memory ECE 313 Fall 2004 Lecture 20 - Memory 18
Current DRAM Parts } Synchronous DRAM (SDRAM) - clocked transfer of bursts of data starting at a specific address } Double-Data Rate SDRAM - transfer two bits/clock cycle } Quad-Data Rate SDRAM - transfer four bits / clock cycle } Rambus RDRAM - High-speed interface for fast transfers } Current PCs use some form of SDRAM/RDRAM } SDRAM w/ PC 100 or PC 133 memory bus } RDRAM w/ PC 800 memory bus ECE 313 Fall 2004 Lecture 20 - Memory 19
Memory Configuration in Current PCs Processor L 1 Cache L 2/L 3 Cache (SRAM) System Controller Main Memory (DRAM) (I/O Bus) ECE 313 Fall 2004 Lecture 20 - Memory 20
Outline - Memory Systems } Overview } Motivation } General Structure and Terminology } Memory Technology } Static RAM } Dynamic RAM } Cache Memory } Virtual Memory ECE 313 Fall 2004 Lecture 20 - Memory 21
Cache Operation } Insert between CPU, Main Mem. } Implement with fast Static RAM } Holds some of a program’s } data } instructions CPU addr data Cache Memory addr } Operation: Processor data Hit: Data in Cache (no penalty) Miss: Data not in Cache (miss penalty) ECE 313 Fall 2004 Lecture 20 - Memory DRAM Memory 22
Four Key Cache Questions: 1. Where can block be placed in cache? (block placement) 2. How can block be found in cache? (block identification) 3. Which block should be replaced on a miss? (block replacement) 4. What happens on a write? (write strategy) ECE 313 Fall 2004 Lecture 20 - Memory 23
Basic Cache Design } Organized into blocks or lines } Block Contents } tag - extra bits to identify block (part of block address) } data - data or instruction words contiguous memory locations } Our example: } One-word (4 byte) block size } 30 -bit tag } Two blocks in cache CPU Cache b 0 b 1 tag CPU 0 tag CPU 1 data 0 data 1 Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C 0 x 0000 ECE 313 Fall 2004 Lecture 20 - Memory 24
Cache Example (2) } Assume: } } CPU r 1==0, r 2==1, r 4==2 1 cycle for cache access 5 cycles for main. mem. access 1 cycle for instr. execution } At cycle 1 - PC=0 x 00 Cache M I S S } Fetch instruction from memory • look in cache • MISS - fetch from main mem (5 cycle penalty) b 0 b 1 (empty) CPU (empty) Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 ECE 313 Fall 2004 Lecture 20 - Memory 25
Cache Example (3) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 000 r 1 6 0 x… 0 1 add r 1, r 2 CPU Cache b 0 b 1 } At cycle 6 } Execute instr. add r 1, r 2 (empty) 0 x… 0 CPU L: add r 1, r 2 (empty) CPU (empty) Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 ECE 313 Fall 2004 Lecture 20 - Memory 26
Cache Example (4) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 0 r 1 6 0 x… 0 1 6 -10 add r 1, r 2 CPU FETCH 0 x… 4 Cache M I S S } At cycle 6 - PC=0 x 04 } Fetch instruction from memory • look in cache • MISS - fetch from main mem (5 cycle penalty) b 0 b 1 (empty) 0 x… 0 CPU L: add r 1, r 2 (empty) CPU (empty) Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 ECE 313 Fall 2004 Lecture 20 - Memory 27
Cache Example (5) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 000 r 1 6 0 x… 0 add r 1, r 2 1 0 x… 4 FETCH 0 x… 004 bne r 4, r 1, L 6 -10 11 CPU Cache 1 } At cycle 11 } Execute instr. bne r 4, r 1, L b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 ECE 313 Fall 2004 Lecture 20 - Memory 28
Cache Example (6) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 0 r 1 6 0 x… 0 add r 1, r 2 1 0 x… 4 FETCH 0 x… 4 bne r 4, r 1, L 1 FETCH 0 x… 0 1 6 -10 11 11 CPU Cache H I T } At cycle 11 - PC=0 x 00 } Fetch instruction from memory } HIT - instruction in cache b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 ECE 313 Fall 2004 Lecture 20 - Memory 29
Cache Example (7) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 0 r 1 6 0 x… 0 add r 1, r 2 1 0 x… 4 FETCH 0 x… 4 bne r 4, r 1, L 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 6 -10 11 CPU Cache } At cycle 12 } Execute add r 1, 2 b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 1, r 2, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 ECE 313 Fall 2004 Lecture 20 - Memory 30
Cache Example (8) Cycle 1 -5 Address Op/Instr. FETCH 0 x… 0 r 1 6 0 x… 0 add r 1, r 2 1 0 x… 4 FETCH 0 x… 4 bne r 4, r 1, L 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 12 FETCH 0 x 04 6 -10 11 CPU Cache H I T } At cycle 12 - PC=0 x 04 } Fetch instruction from memory } HIT - instruction in cache b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 ECE 313 Fall 2004 Lecture 20 - Memory 31
Cache Example (9) Cycle 1 -5 6 6 -10 11 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 r 1 FETCH 0 x… 4 bne r 4, r 1, L 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 12 13 FETCH 0 x 04 bne r 4, r 1, L 0 x… 4 Cache } At cycle 13 } Execute instr. bne r 4, r 1, L } Branch not taken ECE 313 Fall 2004 CPU 1 Lecture 20 - Memory b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 32
Cache Example (10) Cycle 1 -5 6 6 -10 11 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 4, r 1, L r 1 CPU 1 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 12 13 13 FETCH 0 x 04 bne r 4, r 1, L FETCH 0 x 08 Cache M I S S } At cycle 13 - PC=0 x 08 } Fetch Instruction from Memory } MISS - not in cache ECE 313 Fall 2004 Lecture 20 - Memory b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 (empty) bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 33
Cache Example (11) Cycle 1 -5 6 6 -10 11 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 4, r 1, L r 1 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 12 13 13 -17 FETCH 0 x 04 bne r 4, r 1, L FETCH 0 x 08 } At cycle 17 - PC=0 x 08 } Put instruction into cache } Replace existing instruction ECE 313 Fall 2004 CPU 1 Lecture 20 - Memory Cache b 0 b 1 (empty) 0 x… 0 0 x… 2 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: sub add r 1, r 2 r 1, r 1 (empty) � bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 34
Cache Example (12) Cycle 1 -5 6 6 -10 11 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 4, r 1, L r 1 1 12 FETCH 0 x… 0 1 12 add r 1, 2 2 12 13 13 -17 18 FETCH 0 x 04 bne r 4, r 1, L FETCH 0 x 08 sub r 1, r 1 2 2 2 0 } At cycle 18 } Execute sub r 1, r 1 ECE 313 Fall 2004 CPU 1 Lecture 20 - Memory Cache b 0 b 1 0 x… 2 (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C sub r 1, r 1 � bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 35
Cache Example (13) Cycle 1 -5 6 6 -10 11 12 12 12 13 13 -17 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 4, r 1, L FETCH 0 x… 0 add r 1, 2 FETCH 0 x 04 bne r 4, r 1, L FETCH 0 x 08 18 18 r 1 CPU 1 1 1 2 2 Cache M I S S sub r 1, r 1 0 FETCH 0 x 0 C } At cycle 18 } Fetch instruction from memory } MISS - not in cache ECE 313 Fall 2004 Lecture 20 - Memory b 0 b 1 (empty) 0 x… 0 CPU (empty) 0 x… 1 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: sub add r 1, r 2 r 1, r 1 (empty) � bne r 4, r 1, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 36
Cache Example (14) Cycle 1 -5 6 6 -10 11 12 12 12 13 13 -17 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 4, r 1, L FETCH 0 x… 0 add r 1, 2 FETCH 0 x 04 bne r 4, r 1, L FETCH 0 x 08 18 18 -22 r 1 1 1 2 2 sub r 1, r 1 0 FETCH 0 x 0 C } At cycle 22 } Put instruction into cache } Replace existing instruction ECE 313 Fall 2004 CPU 1 Lecture 20 - Memory Cache b 0 b 1 (empty) 0 x… 0 0 x… 2 CPU (empty) 0 x… 1 0 x… 3 CPU Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: sub add r 1, r 2 r 1, r 1 (empty) � j bne L r 1, r 2, L (empty) L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L 0 x 0000 37
Cache Example (15) Cycle 1 -5 6 6 -10 11 11 12 12 13 13 -17 18 18 -22 23 Address Op/Instr. FETCH 0 x… 0 add r 1, r 2 FETCH 0 x… 4 bne r 3, r 1, L FETCH 0 x… 0 0 x… 8 add r 1, 2 FETCH 0 x… 4 bne r 4, r 1, L FETCH 0 x… 8 sub r 1, r 1 FETCH 0 x. . C 0 x… 8 j L r 1 CPU 1 Cache 2 0 b 1 (empty) 0 x… 2 CPU sub r 1, r 1 (empty) 0 x… 3 CPU j(empty) L Main Memory 0 x 00000004 0 x 00000008 0 x 0000000 C L: add r 1, r 2 bne r 4, r 1, L sub r 1, r 1 L: j L } At cycle 23 } Execute ECE 313 Fall 2004 0 x 0000 j L Lecture 20 - Memory 38
Compare No-cache vs. Cache NO CACHE Cycle 1 -5 6 6 -10 11 11 -15 16 16 -20 21 21 -25 26 26 -30 31 CACHE Address Op/Instr. FETCH 0 x… 0 add r 1, r FETCH 0 x… 4 bne r 3, r 1, L FETCH 0 x… 0 add r 1, 2 FETCH 0 x… 4 bne r 3, r 1, L FETCH 0 x… 8 sub r 1, r 1 FETCH 0 x. . C 0 x…C j L ECE 313 Fall 2004 M M H H M M Cycle 1 -5 6 6 -10 11 11 12 12 13 13 -17 18 18 -22 23 Lecture 20 - Memory Address Op/Instr. FETCH 0 x… 0 add r 1, r FETCH 0 x… 4 bne r 3, r 1, L FETCH 0 x… 0 add r 1, 2 FETCH 0 x… 4 bne r 3, r 1, L FETCH 0 x… 8 sub r 1, r 1 FETCH 0 x. . C 0 x…C j L 39
Cache Miss and the MIPS Pipeline } Instruction Fetch Compare in Cycle 1 Miss Detected in Cycle 2 Clock Cycle 1 ECE 313 Fall 2004 Fetch Completes (Pipeline Restarts) Clock Clock Cycle 2+N Cycle 3+N Cycle 4+N Cycle 5+N Cycle 6+N Lecture 20 - Memory 40
Cache Miss and the MIPS Pipeline } Load Instruction Compare in Cycle 4 Clock Cycle 1 Clock Cycle 2 ECE 313 Fall 2004 Clock Cycle 3 Miss Detected in Cycle 5 Clock Cycle 4 Clock Cycle 5 Lecture 20 - Memory Load Completes (Pipeline Restarts) Clock Cycle 5+N Clock Cycle 6+N 41