CS 201 Cache Microarchitecture Gerson Robboy Portland State

CS 201 Cache Micro-architecture Gerson Robboy Portland State University Topics n n n Generic cache memory organization Direct mapped caches Set associative caches What about writing to memory? Multiple processors sharing memory

Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. n Hold frequently accessed blocks of main memory CPU looks first for data in L 1, then in L 2, then in main memory. Typical bus structure: CPU chip register file cache bus L 2 cache – 2– L 1 cache bus interface ALU system bus memory bus I/O bridge main memory 15 -213, F’ 02

On a cache miss… Data is gotten from memory, stored in both L 1 and L 2. The next access will be an L 1 hit. If evicted from L 1, an L 2 hit is still likely. This is true for both reads and writes. – 3– 15 -213, F’ 02

On a store, in case of cache miss… l Pre-read the cache line from memory into both L 1 and L 2 caches. l Store the data value into L 1. l The data is written through to L 2. Written back to memory later. Subsequent stores may write more data into the same line in cache. – 4– 15 -213, F’ 02

The problem with cache design Given an instruction: movl <address>, %<reg> Search the L 1 cache, find the address, and move the data from L 1 to register n All in one clock cycle In case of an L 1 miss and L 2 hit: n n Search L 1 cache, find it’s not there. Search the L 2 cache, find the address, and move the data from L 2 to register Also replicate the cache line in L 1 In at most 10 clock cycles You need more than just fast SRAM to do that. – 5– 15 -213, F’ 02

General Organization of a Cache is an array of sets. Each set contains one or more lines. 1 valid bit t tag bits per line valid S= 0 1 • • • B– 1 • • • set 0: Each line holds a block of data. 2 s sets tag B = 2 b bytes per cache block valid tag 0 1 • • • B– 1 1 • • • B– 1 E lines per set • • • set 1: valid tag 0 • • • valid tag • • • set S-1: valid – 6– 0 tag 0 Cache size: C = B x E x S data bytes 15 -213, F’ 02

Addressing Caches Address A: t bits set 0: set 1: v tag 0 • • • 0 1 • • • B– 1 1 • • • B– 1 • • • set S-1: – 7– v tag 0 • • • 0 m-1 s bits b bits 0 <tag> <set index> <block offset> The word at address A is in the cache if the tag bits in one of the <valid> lines in set <set index> match <tag>. The word contents begin at offset <block offset> bytes from the beginning of the block. 15 -213, F’ 02

Exercise For each cache in the table, m=number of address bits, C=cache size in bytes, B=block size, and E=lines/set. Determine the number of cache sets (S), tag bits (t), set index bits (s), and block offset bits (b). m C B E 32 1024 4 1 32 1024 8 4 32 1024 32 32 – 8– S t s b 15 -213, F’ 02

Direct-Mapped Cache Simplest kind of cache Characterized by exactly one line per set 0: valid tag cache block set 1: valid tag cache block E=1 lines per set • • • set S-1: – 9– valid tag cache block 15 -213, F’ 02

Accessing Direct-Mapped Caches Set selection n Use the set index bits to determine the set of interest. selected set 0: valid tag cache block set 1: valid tag cache block • • • t bits m-1 tag – 10 – s bits b bits 0 0 1 set index block offset 0 set S-1: valid tag cache block 15 -213, F’ 02

Accessing Direct-Mapped Caches Line matching and word selection n n Line matching: Find a valid line in the selected set with a matching tag Word selection: Then extract the word =1? (1) The valid bit must be set 0 selected set (i): 1 0110 1 2 3 4 w 0 5 w 1 w 2 (2) The tag bits in the cache = ? line must match the tag bits in the address m-1 – 11 – t bits 0110 tag 6 s bits b bits i 100 set index block offset 0 7 w 3 (3) If (1) and (2), then cache hit, and block offset selects starting byte. 15 -213, F’ 02

Direct-Mapped Cache Simulation t=1 s=2 x xx m=4 bits/address, M=16 byte address space, B=2 bytes/block, S=4 sets, E=1 entry/set b=1 x v 11 Address trace (reads): 0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002] (miss) tag data 0 m[1] m[0] M[0 -1] (1) (3) v (4) – 12 – 13 [11012] (miss) v tag data 8 [10002] (miss) tag data 11 1 m[9] m[8] M[8 -9] 1 1 M[12 -13] 1 1 0 m[1] m[0] M[0 -1] 1 1 1 m[13] m[12] M[12 -13] v (5) 0 [00002] (miss) tag data 11 0 m[1] m[0] M[0 -1] 11 1 m[13] m[12] M[12 -13] 15 -213, F’ 02

Why Use Middle Bits as Index? 4 -line Cache 00 01 10 11 0000 0001 0010 0011 0100 0101 High-Order Bit Indexing 0110 n Adjacent memory lines would 0111 map to same cache entry 1000 n Poor use of spatial locality 1001 Middle-Order Bit Indexing 1010 n Consecutive memory lines map 1011 to different cache lines 1100 n Can hold C-byte region of address space in cache at one 1101 time 1110 1111 – 13 – High-Order Bit Indexing Middle-Order Bit Indexing 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 15 -213, F’ 02

Exercise t=1 s=2 x xx b=1 x m=4 bits/address, M=16 byte address space, B=2 bytes/block, S=4 sets, E=1 entry/set Address trace (reads): 4 [01002], 1 [00012], 13 [11012], 8 [10002], 0 [00002], 5 [0101] Draw a chart showing the cache misses and hits. What is the problem with one entry per set? – 14 – 15 -213, F’ 02

Set Associative Caches Characterized by more than one line per set n This example is a two-way set associative cache set 0: set 1: valid tag cache block E=2 lines per set • • • set S-1: – 15 – valid tag cache block 15 -213, F’ 02

Accessing Set Associative Caches Set selection n identical to direct-mapped cache set 0: Selected set 1: valid tag cache block • • • t bits m-1 tag – 16 – set S-1: s bits b bits 0 0 1 set index block offset 0 valid tag cache block 15 -213, F’ 02

Accessing Set Associative Caches Line matching and word selection n must compare the tag in each valid line in the selected set. =1? (1) The valid bit must be set. 0 selected set (i): 1 1001 1 0110 (2) The tag bits in one of the cache lines must match the tag bits in the address 2 3 4 w 0 t bits 0110 tag 5 6 w 1 w 2 7 w 3 (3) If (1) and (2), then cache hit, and block offset selects starting byte. = ? m-1 – 17 – 1 s bits b bits i 100 set index block offset 0 15 -213, F’ 02

Exercise t=2 s=1 xx x b=1 x m=4 bits/address, M=16 byte address space, B=2 bytes/block, S=2 sets, E=2 entries/set Address trace (reads): 4 [01002], 2 [00102], 13 [11012], 3 [00112], 5 [0101], 0 [0000] Draw a chart showing the cache misses and hits. – 18 – 15 -213, F’ 02

Associative Caches A cache with N lines per set is called an N-way set associative cache For example, a typical microprocessor might have a 4 way set associative cache – 19 – 15 -213, F’ 02

Exercise A processor has 32 bit addresses. The L 2 cache is 256 K-bytes in size, 8 -way set associative, with a block size of 64 bytes. How many block offset bits are there, how many set index bits, and how many tag bits? Given an address 0 x 30004 a 5 c What is the block offset, what is the set index, and what is the tag? – 20 – 15 -213, F’ 02

Fully associative caches A cache with exactly one set is called a fully associative cache n n All lines in the cache are in that one set Each line is uniquely identified by the tag bits alone The problem with fully associative caches n n – 21 – There are many cache lines in one set The CPU must find a matching tag very fast (less than one clock cycle) You need logic to compare all the tags in parallel This can get expensive 15 -213, F’ 02

Review A direct mapped cache has one line per set n There as many sets as there are cache lines in the cache n Likelihood of contention l “conflict misses” l Cache entries get evicted when the cache isn’t full A fully associative cache has exactly one set n n All lines in the cache belong to that one set Problem of quick search for a matching tag A set associative cache has several lines per set n – 22 – An N way associative cache has N lines per set 15 -213, F’ 02

What about writing to memory? Written data is also cached. On a cache miss, pre-read the cache line into the cache. Then write the data into the cache line. Subsequent reads or writes will have a cache hit. – 23 – 15 -213, F’ 02

Writing to memory How does the data get from the cache to memory? Write-through cache: Data is written through to memory at the time of the “store” instruction Write-back cache: New data is stored in the cache and written to memory later. What is the problem with a write-through cache? What is the problem with a write-back cache? With write-back, how does the CPU decide when to write the data to memory? – 24 – 15 -213, F’ 02

What about multi-processors? What’s the problem with write-back caches? register file cache bus L 2 cache L 1 cache ALU bus interface system bus register file cache bus L 2 cache – 25 – L 1 cache bus interface ALU memory bus I/O bridge main memory 15 -213, F’ 02