CSI 312 CCE 325 Computer Architecture Chapter 5

  • Slides: 74
Download presentation
CSI 312 / CCE 325 – Computer Architecture Chapter 5 Memory Hierarchy Part of

CSI 312 / CCE 325 – Computer Architecture Chapter 5 Memory Hierarchy Part of the slides in this chapter are prepared by Dr. Majd Sakr AUST 2004 Many thanks goes to him for his contributions in this course. CSI/CCE - Computer Architecture 1

Exploiting Memory Hierarchy ° Users want large and fast memories! ° SRAM access times

Exploiting Memory Hierarchy ° Users want large and fast memories! ° SRAM access times are 0. 5 – 5 ns at a very high cost. ° DRAM access times are 50 -70 ns at a very high cost. ° Disk access times are 5 to 20 million ns at cost of $. 50 to $2 per GB*. 2 CSI/CCE - Computer Architecture * these Prices are outdated, but still they show a figure.

Memory Hierarchy Size & access time increase 3 CSI/CCE - Computer Architecture

Memory Hierarchy Size & access time increase 3 CSI/CCE - Computer Architecture

Principle of Locality § Programs access a small proportion of their address space at

Principle of Locality § Programs access a small proportion of their address space at any time § Temporal locality § Items accessed recently are likely to be accessed again soon § e. g. , instructions in a loop, induction variables § Spatial locality § Items near those accessed recently are likely to be accessed soon § E. g. , sequential instruction access, array data CSI/CCE - Computer Architecture

Cache (1) Our initial focus: 2 levels of memory (upper, lower) block: minimum unit

Cache (1) Our initial focus: 2 levels of memory (upper, lower) block: minimum unit of data hit: data requested is found in the upper level miss: data requested is not found in the upper level 5 CSI/CCE - Computer Architecture

Cache (2) § Two issues: § How do we know if a data item

Cache (2) § Two issues: § How do we know if a data item is in the cache? § If it is, how do we find it? § Our first example: § block size is one word of data § "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e. g. , lots of items at the lower level share locations in the upper level 6 CSI/CCE - Computer Architecture

Cache Design §How do we organize cache? §Where does each memory address map to?

Cache Design §How do we organize cache? §Where does each memory address map to? § (Remember that cache is subset of memory, so multiple memory addresses map to the same cache location. ) §How do we know which elements are in cache? §How do we quickly locate them? 7 CSI/CCE - Computer Architecture

Direct Mapped Cache ° Mapping: address is modulo the number of blocks in the

Direct Mapped Cache ° Mapping: address is modulo the number of blocks in the cache 8 CSI/CCE - Computer Architecture

Direct-Mapped Cache (1/2) ° In a direct-mapped cache, each memory address is associated with

Direct-Mapped Cache (1/2) ° In a direct-mapped cache, each memory address is associated with one possible block within the cache • Therefore, we only need to look in a single location in the cache for the data if it exists in the cache • Block is the unit of transfer between cache and memory 9 CSI/CCE - Computer Architecture

Direct-Mapped Cache (2/2) Memory Address Memory 0 1 2 3 4 5 6 7

Direct-Mapped Cache (2/2) Memory Address Memory 0 1 2 3 4 5 6 7 8 9 A B C D E F CSI/CCE - Computer Architecture Cache Index 0 1 2 3 4 Byte Direct Mapped Cache ° Cache Location 0 can be occupied by data from: • Memory location 0, 4, 8, . . . • 4 blocks => any memory location that is multiple of 4 10

Issues with Direct-Mapped Tag Index Offset ° Since multiple memory addresses map to same

Issues with Direct-Mapped Tag Index Offset ° Since multiple memory addresses map to same cache index, how do we tell which one is in there? ° What if we have a block size > 1 byte? ° Answer: divide memory address into three fields: HEIGHT WIDTH ttttttttt iiiii oooo tag to check if have correct block CSI/CCE - Computer Architecture index to select block byte offset within block 11

Direct-Mapped Cache Terminology ° All fields are read as unsigned integers. ° Index: specifies

Direct-Mapped Cache Terminology ° All fields are read as unsigned integers. ° Index: specifies the cache index (which “row” of the cache we should look in) ° Offset: once we’ve found correct block, offset specifies which byte within the block we want ° Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location 12 CSI/CCE - Computer Architecture

Caching Terminology ° When we try to read memory, 3 things can happen: 1.

Caching Terminology ° When we try to read memory, 3 things can happen: 1. cache hit: cache block is valid and contains proper address, so read desired word 2. cache miss: nothing in cache in appropriate block, so fetch from memory 3. cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory (cache always copy) 13 CSI/CCE - Computer Architecture

Direct-Mapped Cache Example (1/3) ° Suppose we have a 16 KB of data in

Direct-Mapped Cache Example (1/3) ° Suppose we have a 16 KB of data in a direct-mapped cache with 4 word blocks ° Determine the size of the tag, index and offset fields if we are using a 32 -bit architecture ° Offset • need to specify correct byte within a block • block contains 4 words (in MIPS words are aligned to multiples of 4 bytes) = 16 bytes = 24 bytes • need 4 bits to specify correct byte 14 CSI/CCE - Computer Architecture

Direct-Mapped Cache Example (2/3) ° Index: (index into an “array of blocks”) • need

Direct-Mapped Cache Example (2/3) ° Index: (index into an “array of blocks”) • need to specify correct row in cache • cache contains 16 KB = 214 bytes • block contains 24 bytes (4 words) • # blocks/cache = bytes/cache bytes/block = 214 bytes/cache 24 bytes/block = 210 blocks/cache • need 10 bits to specify this many rows CSI/CCE - Computer Architecture 15

Direct-Mapped Cache Example (3/3) ° Tag: use remaining bits as tag • tag length

Direct-Mapped Cache Example (3/3) ° Tag: use remaining bits as tag • tag length = address length – offset - index = 32 - 4 - 10 bits = 18 bits • so tag is leftmost 18 bits of memory address ° Why not full 32 bit address as tag? • All bytes within block need same address (4 b) • Index must be same for every address within a block, so it’s redundant in tag check, thus can leave off to save memory (here 10 bits) 16 CSI/CCE - Computer Architecture

TIO great cache mnemonic 2(H+W) = 2 H * 2 W AREA (cache size,

TIO great cache mnemonic 2(H+W) = 2 H * 2 W AREA (cache size, B) = HEIGHT (# of blocks) * WIDTH (size of one block, B/block) Tag Index Offset HEIGHT (# of blocks) WIDTH (size of one block, B/block) AREA (cache size, B) 17 CSI/CCE - Computer Architecture

Accessing data in a direct mapped cache Memory ° Ex. : 16 KB of

Accessing data in a direct mapped cache Memory ° Ex. : 16 KB of data, Address (hex) Value of Word direct-mapped, 4 word blocks. . . ° Read 4 addresses 1. 0 x 00000014 2. 0 x 0000001 C 3. 0 x 00000034 4. 0 x 00008014 ° Memory values on right: • only cache/ memory level of hierarchy CSI/CCE - Computer Architecture 00000010 00000014 00000018 0000001 C. . . 00000030 00000034 00000038 0000003 C. . . 00008010 00008014 00008018 0000801 C. . . a b c d. . . e f g h. . . i j k l. . . 18

Accessing data in a direct mapped cache ° 4 Addresses: • 0 x 00000014,

Accessing data in a direct mapped cache ° 4 Addresses: • 0 x 00000014, 0 x 0000001 C, 0 x 00000034, 0 x 00008014 ° 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields 0000000001 0100 0000000001 1100 00000000011 0100 0000000010 000001 0100 Tag CSI/CCE - Computer Architecture Index Offset 19

16 KB Direct Mapped Cache, 16 B blocks ° Valid bit: determines whether anything

16 KB Direct Mapped Cache, 16 B blocks ° Valid bit: determines whether anything is stored in that row (when computer initially turned on, all entries invalid) Valid Index Tag 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 0 x 4 -7 0 x 0 -3 . . . 1022 0 1023 0 CSI/CCE - Computer Architecture 0 x 8 -b 0 xc-f . . . 20

1. Read 0 x 00000014 ° 0000000001 0100 Tag field Index field Offset Valid

1. Read 0 x 00000014 ° 0000000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 21 CSI/CCE - Computer Architecture

So we read block 1 (000001) ° 0000000001 0100 Tag field Index field Offset

So we read block 1 (000001) ° 0000000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 22 CSI/CCE - Computer Architecture

No valid data ° 0000000001 0100 Tag field Index field Offset Valid 0 x

No valid data ° 0000000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 23 CSI/CCE - Computer Architecture

So load that data into cache, setting tag, valid ° 0000000001 0100 Tag field

So load that data into cache, setting tag, valid ° 0000000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 24 CSI/CCE - Computer Architecture

Read from cache at offset, return word b ° 0000000001 0100 Tag field Index

Read from cache at offset, return word b ° 0000000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 25 CSI/CCE - Computer Architecture

2. Read 0 x 0000001 C = 0… 00 0. . 001 1100 °

2. Read 0 x 0000001 C = 0… 00 0. . 001 1100 ° 0000000001 1100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 26 CSI/CCE - Computer Architecture

Index is Valid ° 0000000001 1100 Tag field Index field Offset Valid 0 x

Index is Valid ° 0000000001 1100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 27 CSI/CCE - Computer Architecture

Index valid, Tag Matches ° 0000000001 1100 Tag field Index field Offset Valid 0

Index valid, Tag Matches ° 0000000001 1100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 28 CSI/CCE - Computer Architecture

Index Valid, Tag Matches, return d ° 0000000001 1100 Tag field Index field Offset

Index Valid, Tag Matches, return d ° 0000000001 1100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 29 CSI/CCE - Computer Architecture

3. Read 0 x 00000034 = 0… 00 0. . 011 0100 ° 00000000011

3. Read 0 x 00000034 = 0… 00 0. . 011 0100 ° 00000000011 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 30 CSI/CCE - Computer Architecture

So read block 3 ° 00000000011 0100 Tag field Index field Offset Valid 0

So read block 3 ° 00000000011 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 31 CSI/CCE - Computer Architecture

No valid data ° 00000000011 0100 Tag field Index field Offset Valid 0 x

No valid data ° 00000000011 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 32 CSI/CCE - Computer Architecture

Load that cache block, return word f ° 00000000011 0100 Tag field Index field

Load that cache block, return word f ° 00000000011 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 33 CSI/CCE - Computer Architecture

4. Read 0 x 00008014 = 0… 10 0. . 001 0100 ° 0000000010

4. Read 0 x 00008014 = 0… 10 0. . 001 0100 ° 0000000010 000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 34 CSI/CCE - Computer Architecture

So read Cache Block 1, Data is Valid ° 0000000010 000001 0100 Tag field

So read Cache Block 1, Data is Valid ° 0000000010 000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 35 CSI/CCE - Computer Architecture

Cache Block 1 Tag does not match (0 != 2) ° 0000000010 000001 0100

Cache Block 1 Tag does not match (0 != 2) ° 0000000010 000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 a b c d 1 1 0 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 36 CSI/CCE - Computer Architecture

Miss, so replace block 1 with new data & tag ° 0000000010 000001 0100

Miss, so replace block 1 with new data & tag ° 0000000010 000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 i j k l 1 1 2 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 37 CSI/CCE - Computer Architecture

And return word j ° 0000000010 000001 0100 Tag field Index field Offset Valid

And return word j ° 0000000010 000001 0100 Tag field Index field Offset Valid 0 x 4 -7 0 x 8 -b 0 xc-f 0 x 0 -3 Index Tag 0 0 i j k l 1 1 2 2 0 e f g h 3 1 0 4 0 5 0 6 0 7 0. . . 1022 0 1023 0 38 CSI/CCE - Computer Architecture

Do an example yourself. What happens? ° Chose from: Cache: Hit, Miss w. replace

Do an example yourself. What happens? ° Chose from: Cache: Hit, Miss w. replace Values returned: a , b, c, d, e, . . . , k, l ° Read address 0 x 00000030 ? 00000000011 0000 ° Read address 0 x 0000001 c ? 0000000001 1100 Cache Valid. Tag Index 0 0 1 1 2 2 0 3 1 0 4 0 5 0 6 0 7 0 0 x 4 -7 0 x 0 -3 . . . CSI/CCE - Computer Architecture 0 x 8 -b 0 xc-f i j k l e f g h . . . 39

Answers ° 0 x 00000030 a hit Memory Address Value of Word. . .

Answers ° 0 x 00000030 a hit Memory Address Value of Word. . . a 00000010 b 00000014 ° 0 x 0000001 c a miss c 00000018 Index = 1, Tag mismatch, so d 0000001 c replace from memory, . . . Offset = 0 xc, value = d e 00000030 f 00000034 ° Since reads, values g 00000038 must = memory values 0000003 c h. . . whether or not cached: i 00008010 • 0 x 00000030 = e j 00008014 k 00008018 • 0 x 0000001 c = d 40 l 0000801 c CSI/CCE - Computer Architecture. . . Index = 3, Tag matches, Offset = 0, value = e

Open your Eyes… ° Drawbacks of Larger data Block Size(level 2) • Larger block

Open your Eyes… ° Drawbacks of Larger data Block Size(level 2) • Larger block size means larger miss penalty - on a miss, takes longer time to load a new block from next level • If block size is too big relative to cache size, then there are too few blocks - Result: miss rate goes up ° In general, minimize Average Access Time = Hit Time x Hit Rate + Miss Penalty x Miss Rate 41 CSI/CCE - Computer Architecture

Review… ° Hit Time = time to find and retrieve data from current level

Review… ° Hit Time = time to find and retrieve data from current level cache ° Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy) ° Hit Rate = % of requests that are found in current level cache ° Miss Rate = 1 - Hit Rate CSI/CCE - Computer Architecture 42

Review… °Average Memory Access Time = Hit Time x Hit Rate + Miss Penalty

Review… °Average Memory Access Time = Hit Time x Hit Rate + Miss Penalty x Miss Rate Assume Hit Rate is included in the Hit Time. ° Average Memory Access Time = Hit Time + Miss Penalty x Miss Rate 43 CSI/CCE - Computer Architecture

Bits in Cache - Home. Read 2 m 2 m 44 CSI/CCE - Computer

Bits in Cache - Home. Read 2 m 2 m 44 CSI/CCE - Computer Architecture

Bits in Cache - Home. Read 45 CSI/CCE - Computer Architecture

Bits in Cache - Home. Read 45 CSI/CCE - Computer Architecture

slide 8 46 CSI/CCE - Computer Architecture

slide 8 46 CSI/CCE - Computer Architecture

Hits vs. Misses (1) °Read hits • this is what we want! °Read misses

Hits vs. Misses (1) °Read hits • this is what we want! °Read misses • stall the CPU, fetch block from memory, deliver to cache, restart 47 CSI/CCE - Computer Architecture

Hits vs. Misses (2) °Write hits: • can replace data in cache and memory

Hits vs. Misses (2) °Write hits: • can replace data in cache and memory (write-through: data is always consistent in both) • write the data only into the cache (writeback the cache later) °Write misses: • read the entire block into the cache, then write the word 48 CSI/CCE - Computer Architecture

Performance (1) °Increasing the block size in cache tends to decrease miss rate: 49

Performance (1) °Increasing the block size in cache tends to decrease miss rate: 49 CSI/CCE - Computer Architecture

Performance (2) ° Simplified model: CPU execution time = (CPU execution clk cycles +

Performance (2) ° Simplified model: CPU execution time = (CPU execution clk cycles + Mem-stall clk cycles) ´ cycle time Mem-stall cycles = # of instructions ´ miss ratio ´ miss penalty ---(applies for read/write but, write: + write buffer stalls)--- ° Two ways of improving performance: • decreasing the miss ratio • decreasing the miss penalty 50 CSI/CCE - Computer Architecture

51 CSI/CCE - Computer Architecture

51 CSI/CCE - Computer Architecture

52 CSI/CCE - Computer Architecture

52 CSI/CCE - Computer Architecture

Review… ° Hit Time = time to find and retrieve data from current level

Review… ° Hit Time = time to find and retrieve data from current level cache ° Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy) ° Hit Rate = % of requests that are found in current level cache ° Miss Rate = 1 - Hit Rate CSI/CCE - Computer Architecture 54

Review… °Average Memory Access Time = Hit Time x Hit Rate + Miss Penalty

Review… °Average Memory Access Time = Hit Time x Hit Rate + Miss Penalty x Miss Rate Assume Hit Rate is included in the Hit Time. ° Average Memory Access Time = Hit Time + Miss Penalty x Miss Rate 55 CSI/CCE - Computer Architecture

56 CSI/CCE - Computer Architecture

56 CSI/CCE - Computer Architecture

Example °Assume • Hit Time = 1 cycle • Miss rate = 5% •

Example °Assume • Hit Time = 1 cycle • Miss rate = 5% • Miss penalty = 20 cycles • Calculate AMAT… °Average memory access time = 1 + 0. 05 x 20 = 1 + 1 cycles = 2 cycles CSI/CCE - Computer Architecture 57

Types of Cache Misses (1/2) ° “Three Cs” Model of Misses ° 1 st

Types of Cache Misses (1/2) ° “Three Cs” Model of Misses ° 1 st C: Compulsory Misses • occur when a program is first started • cache does not contain any of that program’s data yet, so misses are bound to occur • can’t be avoided easily, so won’t focus on these in this course 59 CSI/CCE - Computer Architecture

Types of Cache Misses (2/2) ° 2 nd C: Conflict Misses • miss that

Types of Cache Misses (2/2) ° 2 nd C: Conflict Misses • miss that occurs because two distinct memory addresses map to the same cache location • two blocks (which happen to map to the same location) can keep overwriting each other • big problem in direct-mapped caches • how do we lessen the effect of these? ° Dealing with Conflict Misses • Solution 1: Make the cache size bigger - Fails at some point • Solution 2: Multiple distinct blocks can fit in the same cache Index? 60 CSI/CCE - Computer Architecture

Associative Caches § Fully associative • Allow a given block to go in any

Associative Caches § Fully associative • Allow a given block to go in any cache entry • Requires all entries to be searched at once • Comparator per entry (expensive) § n-way set associative • Each set contains n entries • Block number determines which set - (Block number) modulo (#Sets in cache) • Search all entries in a given set at once • n comparators (less expensive) CSI/CCE - Computer Architecture

Associative Cache Example CSI/CCE - Computer Architecture

Associative Cache Example CSI/CCE - Computer Architecture

Spectrum of Associativity ° For a cache with 8 entries CSI/CCE - Computer Architecture

Spectrum of Associativity ° For a cache with 8 entries CSI/CCE - Computer Architecture

Replacement Policy ° Direct mapped: no choice ° Set associative • choose among entries

Replacement Policy ° Direct mapped: no choice ° Set associative • choose among entries in the set ° Least-recently used (LRU) • Choose the one unused for the longest time ° Random • Gives approximately the same performance as LRU for high associativity CSI/CCE - Computer Architecture

Associativity Example ° Compare 4 -block caches • Direct mapped, 2 -way set associative,

Associativity Example ° Compare 4 -block caches • Direct mapped, 2 -way set associative, fully associative • Block access sequence: 0, 8, 0, 6, 8 ° Direct mapped To which cache block each block address map? Block address 0 8 0 6 8 Cache index 0 0 0 2 0 CSI/CCE - Computer Architecture Hit/miss miss 0 Mem[0] Mem[8] Block address Cache Block 0 (0 mod 4) = 0 6 (6 mod 4) = 2 8 (8 mod 4) = 0 Cache content after access 1 2 Mem[6] 3

Associativity Example ° 2 -way set associative Block address 0 8 0 6 8

Associativity Example ° 2 -way set associative Block address 0 8 0 6 8 Cache index 0 0 0 Hit/miss hit miss Block address Cache Block 0 (0 mod 2) = 0 6 (6 mod 2) = 0 8 (8 mod 2) = 0 Cache content after access Set 0 Set 1 Mem[0] Mem[8] Mem[0] Mem[6] Mem[8] Mem[6] Block access sequence: 0, 8, 0, 6, 8 n Block address 0 8 0 6 8 CSI/CCE - Computer Architecture Fully associative Hit/miss hit Cache content after access Mem[0] Mem[0] Mem[8] Mem[6]

How Much Associativity ° Increased associativity decreases miss rate CSI/CCE - Computer Architecture

How Much Associativity ° Increased associativity decreases miss rate CSI/CCE - Computer Architecture

Set Associative Cache Organization CSI/CCE - Computer Architecture

Set Associative Cache Organization CSI/CCE - Computer Architecture

Fully Associative Cache (1/3) ° Memory address fields: • Tag: same as before •

Fully Associative Cache (1/3) ° Memory address fields: • Tag: same as before • Offset: same as before • Index: non-existant ° What does this mean? • no “rows”: any block can go anywhere in the cache • must compare with all tags in entire cache to see if data is there 69 CSI/CCE - Computer Architecture

Fully Associative Cache (2/3) ° Fully Associative Cache (e. g. , 32 B block)

Fully Associative Cache (2/3) ° Fully Associative Cache (e. g. , 32 B block) • compare tags in parallel 31 Cache Tag (27 bits long) = = : = Cache Tag Valid Cache Data B 31 B 0 : = 4 0 Byte Offset = : : : 70 CSI/CCE - Computer Architecture

Fully Associative Cache (3/3) § Benefit of Fully Associative Cache § No Conflict Misses

Fully Associative Cache (3/3) § Benefit of Fully Associative Cache § No Conflict Misses (since data can go anywhere) §Drawbacks of Fully Associative Cache § Need hardware comparator for every single entry 71 CSI/CCE - Computer Architecture

N-Way Set Associative Cache (1/3) ° Memory address fields: • Tag: same as before

N-Way Set Associative Cache (1/3) ° Memory address fields: • Tag: same as before • Offset: same as before • Index: points us to the correct “row” (called a set in this case) ° So what’s the difference? • each set contains multiple blocks • once we’ve found correct set, must compare with all tags in that set to find our data 73 CSI/CCE - Computer Architecture

N-Way Set Associative Cache (2/3) ° Summary: • cache is direct-mapped with respect to

N-Way Set Associative Cache (2/3) ° Summary: • cache is direct-mapped with respect to sets • each set is fully associative • basically N direct-mapped caches working in parallel: each has its own valid bit and data ° Given memory address: • Find correct set using Index value. • Compare Tag with all Tag values in the determined set. • If a match occurs, hit!, otherwise a miss. • Finally, use the offset field as usual to find the desired data within the block. CSI/CCE - Computer Architecture 74

N-Way Set Associative Cache (3/3) ° What’s so great about this? • even a

N-Way Set Associative Cache (3/3) ° What’s so great about this? • even a 2 -way set assoc cache avoids a lot of conflict misses • hardware cost isn’t that bad: only need N comparators ° In fact, for a cache with M blocks, • it’s Direct-Mapped if it’s 1 -way set assoc • it’s Fully Assoc if it’s M-way set assoc • so these two are just special cases of the more general set associative design 75 CSI/CCE - Computer Architecture

Associative Cache Example (1) Memory Address Memory 0 1 2 3 4 5 6

Associative Cache Example (1) Memory Address Memory 0 1 2 3 4 5 6 7 8 9 A B C D E F CSI/CCE - Computer Architecture Cache Index 0 1 2 3 4 Byte Direct Mapped Cache ° Recall this is how a simple direct mapped cache looked. ° This is also a 1 -way setassociative cache! 76

Associative Cache Example (2) Memory Address Memory 0 1 2 3 4 5 6

Associative Cache Example (2) Memory Address Memory 0 1 2 3 4 5 6 7 8 9 A B C D E F CSI/CCE - Computer Architecture Cache Index 0 0 1 1 ° Here’s a simple 2 -way set associative cache. 77