Cache Memory and Performance Memory Hierarchy 1 Many

  • Slides: 25
Download presentation
Cache Memory and Performance Memory Hierarchy 1 Many of the following slides are taken

Cache Memory and Performance Memory Hierarchy 1 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS: APP) Randal E. Bryant and David R. O'Hallaron http: //csapp. cs. cmu. edu/public/lectures. html The book is used explicitly in CS 2505 and CS 3214 and as a reference in CS 2506. CS@VT Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

An Example Memory Hierarchy L 0: L 1: Smaller, faster, costlier per byte L

An Example Memory Hierarchy L 0: L 1: Smaller, faster, costlier per byte L 2: L 3: Larger, slower, cheaper byte L 5: CS@VT L 4: Registers Memory Hierarchy 2 CPU registers hold words retrieved from L 1 cache (SRAM) L 1 cache holds cache lines retrieved from L 2 cache (SRAM) L 2 cache holds cache lines retrieved from main memory Main memory (DRAM) Local secondary storage (local disks) Main memory holds disk blocks retrieved from local disks Local disks hold files retrieved from disks on remote network servers Remote secondary storage (tapes, distributed file systems, Web servers) Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

Random-Access Memory (RAM) Memory Hierarchy 3 Key features – – – RAM is traditionally

Random-Access Memory (RAM) Memory Hierarchy 3 Key features – – – RAM is traditionally packaged as a chip. Basic storage unit is normally a cell (one bit per cell). Multiple RAM chips form a memory. Static RAM (SRAM) – – Each cell stores a bit with a four or six-transistor circuit. Retains value indefinitely, as long as it is kept powered. Relatively insensitive to electrical noise (EMI), radiation, etc. Faster and more expensive than DRAM. Dynamic RAM (DRAM) – – CS@VT Each cell stores bit with a capacitor. One transistor is used for access Value must be refreshed every 10 -100 ms. More sensitive to disturbances (EMI, radiation, …) than SRAM. Slower and cheaper than SRAM. Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

SRAM vs DRAM Summary Trans. per bit Memory Hierarchy 4 Access Needs time refresh?

SRAM vs DRAM Summary Trans. per bit Memory Hierarchy 4 Access Needs time refresh? EDC? Cost Applications SRAM 4 or 6 1 X No Maybe 100 x Cache memories DRAM 1 10 X Yes Main memories, frame buffers CS@VT 1 X Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

Traditional CPU-Memory Bus Structure Memory Hierarchy 5 A bus is a collection of parallel

Traditional CPU-Memory Bus Structure Memory Hierarchy 5 A bus is a collection of parallel wires that carry address, data, and control signals. Buses are typically shared by multiple devices. CPU chip Register file ALU System bus Bus interface CS@VT I/O bridge Computer Organization II Memory bus Main memory © 2005 -2015 CS: APP & Mc. Quain

Memory Read Transaction (1) Memory Hierarchy 6 CPU places address A on the memory

Memory Read Transaction (1) Memory Hierarchy 6 CPU places address A on the memory bus. Register file %eax Load operation: movl A, %eax ALU Main memory I/O bridge Bus interface CS@VT 0 A x Computer Organization II A © 2005 -2015 CS: APP & Mc. Quain

Memory Read Transaction (2) Memory Hierarchy 7 Main memory reads A from the memory

Memory Read Transaction (2) Memory Hierarchy 7 Main memory reads A from the memory bus, retrieves word x, and places it on the bus. Register file %eax Load operation: movl A, %eax ALU I/O bridge Bus interface CS@VT x Main memory 0 x Computer Organization II A © 2005 -2015 CS: APP & Mc. Quain

Memory Read Transaction (3) Memory Hierarchy 8 CPU read word x from the bus

Memory Read Transaction (3) Memory Hierarchy 8 CPU read word x from the bus and copies it into register %eax. Register file %eax x Load operation: movl A, %eax ALU I/O bridge Bus interface CS@VT Main memory 0 x Computer Organization II A © 2005 -2015 CS: APP & Mc. Quain

Memory Write Transaction (1) Memory Hierarchy 9 CPU places address A on bus. Main

Memory Write Transaction (1) Memory Hierarchy 9 CPU places address A on bus. Main memory reads it and waits for the corresponding data word to arrive. Register file %eax y Store operation: movl %eax, A ALU I/O bridge Bus interface CS@VT A Main memory 0 A Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

Memory Write Transaction (2) Memory Hierarchy 10 CPU places data word y on the

Memory Write Transaction (2) Memory Hierarchy 10 CPU places data word y on the bus. Register file %eax y Store operation: movl %eax, A ALU I/O bridge Bus interface CS@VT y Main memory 0 A Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

Memory Write Transaction (3) Memory Hierarchy 11 Main memory reads data word y from

Memory Write Transaction (3) Memory Hierarchy 11 Main memory reads data word y from the bus and stores it at address A. register file %eax y Store operation: movl %eax, A ALU I/O bridge bus interface CS@VT main memory 0 y Computer Organization II A © 2005 -2015 CS: APP & Mc. Quain

The Bigger Picture: I/O Bus Memory Hierarchy 12 CPU chip Register file ALU System

The Bigger Picture: I/O Bus Memory Hierarchy 12 CPU chip Register file ALU System bus Memory bus Main memory I/O bridge Bus interface I/O bus USB controller Graphics adapter Mouse Keyboard Monitor Disk controller Expansion slots for other devices such as network adapters. Disk CS@VT Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

Storage Trends Memory Hierarchy 13 SRAM Metric 1980 1985 1990 1995 2000 2005 2010:

Storage Trends Memory Hierarchy 13 SRAM Metric 1980 1985 1990 1995 2000 2005 2010: 1980 $/MB access (ns) 19, 200 300 2, 900 150 320 35 256 15 100 3 75 2 60 1. 5 320 200 1985 1990 1995 2000 2005 2010: 1980 $/MB 8, 000 access (ns) 375 typical size (MB) 0. 064 880 200 0. 256 100 4 30 70 16 1 60 64 0. 1 50 2, 000 0. 06 40 8, 000 130, 000 9 125, 000 2010: 1980 DRAM Metric Disk Metric 1980 1985 1990 1995 2000 2005 $/MB access (ms) typical size (MB) 500 87 1 100 75 10 8 28 160 0. 30 10 1, 000 0. 01 8 20, 000 0. 005 0. 0003 1, 600, 000 4 3 29 160, 000 1, 500, 000 CS@VT Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

The CPU-Memory Gap Memory Hierarchy 14 The gap widens between DRAM, disk, and CPU

The CPU-Memory Gap Memory Hierarchy 14 The gap widens between DRAM, disk, and CPU speeds. 100 000, 0 Disk 10 000, 0 1 000, 0 SSD 100 000, 0 Disk seek time Flash SSD access time DRAM access time SRAM access time CPU cycle time Effective CPU cycle time ns 10 000, 0 1 000, 0 DRAM 100, 0 1, 0 CPU 0, 1 0, 0 1980 CS@VT 1985 1990 1995 2000 Year 2003 2005 Computer Organization II 2010 © 2005 -2015 CS: APP & Mc. Quain

Memory Hierarchy 15 Locality Principle of Locality: Programs tend to use data and instructions

Memory Hierarchy 15 Locality Principle of Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently Temporal locality: – Recently referenced items are likely to be referenced again in the near future Spatial locality: – Items with nearby addresses tend to be referenced close together in time CS@VT Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

Locality Example Memory Hierarchy 16 sum = 0; for (i = 0; i <

Locality Example Memory Hierarchy 16 sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Data references – – Reference array elements in succession (stride-1 reference pattern). Reference variable sum each iteration. Spatial locality Temporal locality Instruction references – – CS@VT Reference instructions in sequence. Cycle through loop repeatedly. Computer Organization II Spatial locality Temporal locality © 2005 -2015 CS: APP & Mc. Quain

Taking Advantage of Locality Memory Hierarchy 17 Memory hierarchy Store everything on disk Copy

Taking Advantage of Locality Memory Hierarchy 17 Memory hierarchy Store everything on disk Copy recently accessed (and nearby) items from disk to smaller DRAM memory – Main memory Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory – CS@VT Cache memory attached to CPU Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

An Example Memory Hierarchy L 0: L 1: Smaller, faster, costlier per byte L

An Example Memory Hierarchy L 0: L 1: Smaller, faster, costlier per byte L 2: L 3: Larger, slower, cheaper byte L 5: CS@VT L 4: Registers Memory Hierarchy 18 CPU registers hold words retrieved from L 1 cache (SRAM) L 1 cache holds cache lines retrieved from L 2 cache (SRAM) L 2 cache holds cache lines retrieved from main memory Main memory (DRAM) Local secondary storage (local disks) Main memory holds disk blocks retrieved from local disks Local disks hold files retrieved from disks on remote network servers Remote secondary storage (tapes, distributed file systems, Web servers) Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

Caches Cache: Memory Hierarchy 19 a smaller, faster storage device that acts as a

Caches Cache: Memory Hierarchy 19 a smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device. Fundamental idea of a memory hierarchy: – For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1. Why do memory hierarchies work? – – Because of locality, programs tend to access the data at level k more often than they access the data at level k+1. Thus, the storage at level k+1 can be slower, and thus larger and cheaper bit. Big Idea: The memory hierarchy creates a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top. CS@VT Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

General Cache Concepts Cache 8 4 9 Memory Hierarchy 20 14 10 Data is

General Cache Concepts Cache 8 4 9 Memory Hierarchy 20 14 10 Data is copied in block-sized transfer units 10 4 Memory CS@VT 3 Smaller, faster, more expensive memory caches a subset of the blocks 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Computer Organization II Larger, slower, cheaper memory viewed as partitioned into “blocks” © 2005 -2015 CS: APP & Mc. Quain

General Cache Concepts: Hit Memory Hierarchy 21 Request: 14 Cache 8 9 14 3

General Cache Concepts: Hit Memory Hierarchy 21 Request: 14 Cache 8 9 14 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 CS@VT Computer Organization II Data in block b is needed Block b is in cache: Hit! © 2005 -2015 CS: APP & Mc. Quain

General Cache Concepts: Miss Memory Hierarchy 22 Request: 12 Cache 8 9 12 CS@VT

General Cache Concepts: Miss Memory Hierarchy 22 Request: 12 Cache 8 9 12 CS@VT 3 Request: 12 12 Memory 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Computer Organization II Data in block b is needed Block b is not in cache: Miss! Block b is fetched from memory Block b is stored in cache • Placement policy: determines where b goes • Replacement policy: determines which block gets evicted (victim) © 2005 -2015 CS: APP & Mc. Quain

Types of Cache Misses Memory Hierarchy 23 Cold (compulsory) miss – Cold misses occur

Types of Cache Misses Memory Hierarchy 23 Cold (compulsory) miss – Cold misses occur because the cache is empty. Conflict miss – Most caches limit blocks at level k+1 to a small subset (sometimes a singleton) of the block positions at level k. n – E. g. Block i at level k+1 must be placed in block (i mod 4) at level k. Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block. n E. g. Referencing blocks 0, 8, . . . would miss every time. Capacity miss – CS@VT Occurs when the set of active cache blocks (working set) is larger than the cache. Computer Organization II © 2005 -2015 CS: APP & Mc. Quain

Examples of Caching in the Hierarchy Memory Hierarchy 24 Cache Type What is Cached?

Examples of Caching in the Hierarchy Memory Hierarchy 24 Cache Type What is Cached? Where is it Cached? Registers 4 -8 bytes words CPU core 0 Compiler TLB Address translations On-Chip TLB 0 Hardware L 1 cache 64 -bytes block On-Chip L 1 1 Hardware L 2 cache 64 -bytes block On/Off-Chip L 2 10 Hardware Virtual Memory 4 -KB page Main memory 100 Hardware + OS Buffer cache Parts of files Main memory 100 OS Disk cache Disk sectors Disk controller 100, 000 Disk firmware Network buffer cache Parts of files Local disk 10, 000 AFS/NFS client Browser cache Web pages Local disk 10, 000 Web browser Web cache Web pages Remote server disks CS@VT Computer Organization II Latency (cycles) Managed By 1, 000, 000 Web proxy server © 2005 -2015 CS: APP & Mc. Quain

Cache Memories Memory Hierarchy 25 Cache memories are small, fast SRAM-based memories managed automatically

Cache Memories Memory Hierarchy 25 Cache memories are small, fast SRAM-based memories managed automatically in hardware. – Hold frequently accessed blocks of main memory CPU looks first for data in caches (e. g. , L 1, L 2, and L 3), then in main memory. Typical system structure: CPU chip Register file Cache memories Bus interface CS@VT ALU System bus I/O bridge Computer Organization II Memory bus Main memory © 2005 -2015 CS: APP & Mc. Quain