EEL 5708 High Performance Computer Architecture Review Memory

  • Slides: 10
Download presentation
EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy September 10, 2004 Lotzi Bölöni

EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy September 10, 2004 Lotzi Bölöni Fall 2004 EEL 5708/Bölöni Lec 4. 1

Acknowledgements • All the lecture slides were adopted from the slides of David Patterson

Acknowledgements • All the lecture slides were adopted from the slides of David Patterson (1998, 2001) and David E. Culler (2001), Copyright 1998 -2002, University of California Berkeley Fall 2004 EEL 5708/Bölöni Lec 4. 2

The Memory Abstraction • Association of <name, value> pairs – typically named as byte

The Memory Abstraction • Association of <name, value> pairs – typically named as byte addresses – often values aligned on multiples of size • Sequence of Reads and Writes • Write binds a value to an address • Read of address returns most recently written value bound to that address command (R/W) address (name) data (W) data (R) done Fall 2004 EEL 5708/Bölöni Lec 4. 3

Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap (latency) Performance 1000 µProc

Recap: Who Cares About the Memory Hierarchy? Processor-DRAM Memory Gap (latency) Performance 1000 µProc 60%/yr. (2 X/1. 5 yr ) Processor-Memory Performance Gap: (grows 50% / year) DRAM 9%/yr. (2 X/10 yrs) CPU “Joy’s Law” 100 10 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 1 Time Fall 2004 EEL 5708/Bölöni Lec 4. 4

Levels of the Memory Hierarchy Upper Level Capacity Access Time Cost CPU Registers 100

Levels of the Memory Hierarchy Upper Level Capacity Access Time Cost CPU Registers 100 s Bytes <1 s ns Cache 10 s-100 s K Bytes 1 -10 ns $10/ MByte Main Memory M Bytes 100 ns- 300 ns $1/ MByte Disk 10 s G Bytes, 10 ms (10, 000 ns) $0. 0031/ MByte Tape infinite sec-min $0. 0014/ MByte Fall 2004 Staging Xfer Unit faster Registers Instr. Operands prog. /compiler 1 -8 bytes Cache Blocks cache cntl 8 -128 bytes Memory Pages OS 512 -4 K bytes Files user/operator Mbytes Disk Tape Larger Lower Level EEL 5708/Bölöni Lec 4. 5

The Principle of Locality • The Principle of Locality: – Program access a relatively

The Principle of Locality • The Principle of Locality: – Program access a relatively small portion of the address space at any instant of time. • Two Different Types of Locality: – Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e. g. , loops, reuse) – Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e. g. , straightline code, array access) • Last 15 years, HW (hardware) relied on locality for speed Fall 2004 EEL 5708/Bölöni Lec 4. 6

Memory Hierarchy: Terminology • Hit: data appears in some block in the upper level

Memory Hierarchy: Terminology • Hit: data appears in some block in the upper level (example: Block X) – Hit Rate: the fraction of memory access found in the upper level – Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss • Miss: data needs to be retrieve from a block in the lower level (Block Y) – Miss Rate = 1 - (Hit Rate) – Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor • Hit Time << Miss Penalty (500 instructions on 21264!) To Processor Upper Level Memory Lower Level Memory Blk X From Processor Fall 2004 Blk Y EEL 5708/Bölöni Lec 4. 7

Computer System Components Proc Caches Buses Memory adapters Controllers I/O Devices: Disks Displays Keyboards

Computer System Components Proc Caches Buses Memory adapters Controllers I/O Devices: Disks Displays Keyboards Networks • All have interfaces & organizations • Bus & Bus Protocol is key to composition => peripheral hierarchy Fall 2004 EEL 5708/Bölöni Lec 4. 8

A Modern Memory Hierarchy • By taking advantage of the principle of locality: –

A Modern Memory Hierarchy • By taking advantage of the principle of locality: – Present the user with as much memory as is available in the cheapest technology. – Provide access at the speed offered by the fastest technology. • Requires servicing faults on the processor Processor Control Speed (ns): 1 s Size (bytes): 100 s Fall 2004 On-Chip Cache Registers Datapath Second Level Cache (SRAM) Main Memory (DRAM) 10 s 100 s Ks Ms Tertiary Secondary Storage (Disk/Tape) (Disk) 10, 000 s 10, 000, 000 s (10 s ms) (10 s sec) Gs Ts EEL 5708/Bölöni Lec 4. 9

Summary • Modern Computer Architecture is about managing and optimizing across several levels of

Summary • Modern Computer Architecture is about managing and optimizing across several levels of abstraction wrt dramatically changing technology and application load • Key Abstractions – instruction set architecture – memory – bus • Key concepts – – HW/SW boundary Compile Time / Run Time Pipelining Caching • Performance Iron Triangle relates combined effects – Total Time = Inst. Count x CPI x Cycle Time Fall 2004 EEL 5708/Bölöni Lec 4. 10