ECE 232 Hardware Organization and Design Part 14

Recap: Machine Organization Personal Computer Processor (CPU) (active) Control (“brain”) Datapath ECE 232: Memory

Memory Basics § § § Users want large and fast memories! Fact • Large

Memory Technology § SRAM and DRAM are: Random Access storage • Access time is

Memory Latency Problem Processor-DRAM Memory Performance Gap Motivation for Memory Hierarchy µProc 60%/yr. (2

Need for speed § Assume CPU runs at 3 GHz § Every instruction requires

Need for Large Memory § Small memories are fast § So just write small

The Goal: Illusion of large, fast, cheap memory § How do we create a

Memory Hierarchy § Hierarchy of Levels • Uses smaller and faster memory technologies close

Memory Hierarchy Pyramid Processor (CPU) transfer datapath: bus Decreasing distance from CPU, Decreasing Access

Basic Philosophy § Move data into ‘smaller, faster’ memory § Operate on it §

Typical Hierarchy Cache/MM CPU regs 8 B C a c h e 32 B

Bandwidth Issue § Fetch large blocks at a time (Bandwidth) • Supports spatial locality

Why Hierarchy works: Natural Locality § The Principle of Locality • Programs access a

Memory Hierarchy: Terminology § Hit: data appears in upper level in block X §

Current Memory Hierarchy Processor Control Speed(ns): Size (MB): Cost ($/MB): Technology: regs Datapath L

How is the hierarchy managed? § Registers « Memory • By the compiler (or

Memory Hierarchy Design Four Questions for Memory Hierarchy Q 1: Where to place a

Q 4: What happens on a write? § Write policies • Write–through—The information is

Slides: 19

Download presentation

ECE 232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4 th edition), 7 (3 rd edition) http: //www. ecs. umass. edu/ece 232/ Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

Recap: Machine Organization Personal Computer Processor (CPU) (active) Control (“brain”) Datapath ECE 232: Memory Hierarchy 2 Memory (passive) (where programs, & data live when running) Devices Input Output Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Memory Basics § § § Users want large and fast memories! Fact • Large memories are slow • Fast memories are small Large memories use DRAM technology: Dynamic Random Access Memory • High density, low power, cheap, slow • Dynamic: needs to be “refreshed” regularly DRAM access times are 50 -70 ns at cost of $10 to $20 per GB • FPM (Fast Page Mode) • ECC (Error Correcting Code) • SDRAM (Synchronous Dynamic RAM) • DDR (Data Transfer at both edges), source synchronous Fast memories use SRAM: Static Random Access Memory • Low density, high power, expensive, fast • Static: content last “forever”(until lose power) SRAM access times are. 5 – 5 ns at cost of $400 to $1, 000 per GB ECE 232: Memory Hierarchy 3 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Memory Technology § SRAM and DRAM are: Random Access storage • Access time is the same for all locations (Hardware decoder used) § For even larger and cheaper storage (than DRAM) use hard drive (Disk): Sequential Access • Very slow, Data accessed sequentially, access time is location dependent, considered as I/O • Disk access times are 5 to 20 million ns (i. e. , msec) at cost of $. 20 to $2. 00 per GB ECE 232: Memory Hierarchy 4 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Memory Latency Problem Processor-DRAM Memory Performance Gap Motivation for Memory Hierarchy µProc 60%/yr. (2 X/1. 5 yr) Performance 1000 100 Processor-Memory Performance Gap: (grows 50% / year) 10 DRAM 5%/yr. (2 X/15 yrs) 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 1 Time ECE 232: Memory Hierarchy 5 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Need for speed § Assume CPU runs at 3 GHz § Every instruction requires 4 B of instruction and at least one memory access (4 B of data) • 3 * 8 = 24 GB/sec § Peak performance of sequential burst of transfer (Performance for random access is much slower due to latency) § Memory bandwidth and access time is a performance bottleneck Interface Width Frequency Bytes/Sec 4 -way interleaved PC 1600 (DDR 200) SDRAM 4 x 64 bits 100 MHz DDR 6. 4 GB/s Opteron Hyper. Transport memory bus 128 bits 200 MHz DDR 6. 4 GB/s Pentium 4 "800 MHz" FSB 64 bits 200 MHz QDR 6. 4 GB/s PC 2 6400 (DDR -II 800) SDRAM 64 bits 400 MHz DDR 6. 4 GB/s PC 2 5300 (DDR -II 667) SDRAM 64 bits 333 MHz DDR 5. 3 GB/s Pentium 4 "533 MHz" FSB 64 bits 133 MHz QDR 4. 3 GB/s FSB – Front-Side Bus ECE 232: Memory Hierarchy 6 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Need for Large Memory § Small memories are fast § So just write small programs “ 640 K of memory should be enough for anybody” -- Bill Gates, 1981 § Today’s programs require large memories • Powerpoint 2003 – 25 megabytes • Data base applications may require Gigabytes of memory ECE 232: Memory Hierarchy 7 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

The Goal: Illusion of large, fast, cheap memory § How do we create a memory that is large, cheap and fast (most of the time)? § Strategy: Provide a Small, Fast Memory which holds a subset of the main memory – called cache • Keep frequently-accessed locations in fast cache • Cache retrieves more than one word at a time • Sequential accesses are faster after first access ECE 232: Memory Hierarchy 8 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Memory Hierarchy § Hierarchy of Levels • Uses smaller and faster memory technologies close to the processor • Fast access time in highest level of hierarchy • Cheap, slow memory furthest from processor § The aim of memory hierarchy design is to have access time close to the highest level and size equal to the lowest level ECE 232: Memory Hierarchy 9 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Memory Hierarchy Pyramid Processor (CPU) transfer datapath: bus Decreasing distance from CPU, Decreasing Access Time (Memory Latency) Level 1 Level 2 Level 3 Increasing Distance from CPU, Decreasing cost / MB . . . Level n Size of memory at each level ECE 232: Memory Hierarchy 10 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Basic Philosophy § Move data into ‘smaller, faster’ memory § Operate on it § Move it back to ‘larger, cheaper’ memory • How do we keep track if changed § What if we run out of space in ‘smaller, faster’ memory? § Important Concepts: Latency, Bandwidth ECE 232: Memory Hierarchy 11 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Typical Hierarchy Cache/MM CPU regs 8 B C a c h e 32 B Memory virtual memory 4 KB disk § Notice that the data width is changing • Why? § Bandwidth: Transfer rate between various levels • CPU-Cache: 24 GBps • Cache-Main: 0. 5 -6. 4 GBps • Main-Disk: 187 MBps (serial ATA/1500) ECE 232: Memory Hierarchy 12 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Bandwidth Issue § Fetch large blocks at a time (Bandwidth) • Supports spatial locality for (i=0; i < length; i++) sum += array[i]; • array has spatial locality • sum has temporal locality ECE 232: Memory Hierarchy 13 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Why Hierarchy works: Natural Locality § The Principle of Locality • Programs access a relatively small portion of the address space at any second Probability of reference 1 0 0 Memory Address 2 n - 1 § Temporal Locality (Locality in Time): Recently accessed data tend to be referenced again soon § Spatial Locality (Locality in Space): nearby items will tend to be referenced soon ECE 232: Memory Hierarchy 14 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Memory Hierarchy: Terminology § Hit: data appears in upper level in block X § Hit Rate: the fraction of memory accesses found in the upper level § Miss: data needs to be retrieved from a block in the lower level (Block Y) § Miss Rate = 1 - (Hit Rate) § Hit Time: Time to access the upper level which consists of Time to determine hit/miss + upper level access time § Miss Penalty: Time to replace a block in the upper level + Time to deliver the block to the processor § Note: Hit Time << Miss Penalty Lower Level To Processor Upper Level From Processor ECE 232: Memory Hierarchy 15 Block Y Block X Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Current Memory Hierarchy Processor Control Speed(ns): Size (MB): Cost ($/MB): Technology: regs Datapath L 1 Cache 1 ns 0. 0005 -Regs L 2 Cache 2 ns 0. 1 $10 SRAM 6 ns 1 -4 $3 SRAM Main Memory 100 ns 1000 -6000 $0. 01 DRAM Secondary Memory 10, 000 ns 500, 000 $0. 002 Disk • Cache - Main memory: Speed • Main memory – Disk (virtual memory): Capacity ECE 232: Memory Hierarchy 16 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

How is the hierarchy managed? § Registers « Memory • By the compiler (or assembly language Programmer) § Cache « Main Memory • By hardware § Main Memory « Disks • By combination of hardware and the operating system • virtual memory • Also by the programmer (in case of files) Processor Control ECE 232: Memory Hierarchy 17 regs Datapath L 1 Cache L 2 Cache Main Memory Secondary Memory Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Memory Hierarchy Design Four Questions for Memory Hierarchy Q 1: Where to place a block in upper level? (Block placement) Anywhere, in a single specific place, in one out of several specific places Q 2: How to find a block in a the upper level? (Block identification) Q 3: Which block should be replaced on a miss in upper level? (Block replacement) Replacement policy Q 4: What happens on a write in upper level? (Write strategy) ECE 232: Memory Hierarchy 18 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren

Q 4: What happens on a write? § Write policies • Write–through—The information is written to both upper and lower level • easier to implement • the lower level has the most current copy of the data (Data consistency, coherence) • Write–back—The information is only written to the upper level; the modified block is written to the lower level only when it is replaced • uses less bandwidth, since multiple writes within a block only requires one write to lower level • a read miss (which causes a block to be replaced and therefore) may result in writes to lower level § A block in a write–back upper level can be either clean or dirty, depending on whether the block content is the same as that in lower level ECE 232: Memory Hierarchy 19 Adapted from Computer Organization and Design, Patterson&Hennessy, UCB, Kundu, UMass Koren