1020 Lecture Topics HW 3 Problem 2 Caches
10/20: Lecture Topics • HW 3 Problem 2 • Caches – – Types of cache misses Cache performance Cache tradeoffs Cache summary • Input/Output – Types of I/O Devices – How devices communicate with the rest of the system • communicating with the processor • communicating with memory
Problem #2 on HW 3 move $a 0, $s 0 move $a 1, $s 1 move $a 2, $s 2 move $a 3, $s 3 # Position A jal Add 4 # Position B move $t 0, $v 0 move $a 0, $s 4 move $a 1, $s 5 move $a 2, $s 6 move $a 3, $s 7 # Position C jal Add 4 # Position D move $t 1, $v 0 add $t 2, $t 0, $t 1 Add 4: # Position E jal Add 2 # Position F move $s 0, $v 0 move $a 0, $a 2 move $a 1, $a 3 # Position G jal Add 2 # Position H move $s 1, $v 0 add $v 0, $s 1 # Position I jr $ra Add 2: add jr $v 0, $a 1 $ra
Preservation Conventions Preserved Not Preserved Saved registers: $s 0$s 7 Temporary registers: $t 0 -$t 9 Stack pointer register: $sp Argument registers: $a 0 -$a 3 Return address register: $ra Return value registers: $v 0 -$v 1 Stack above the stack pointer Stack below the stack pointer
Callee-Saved Registers Add 4: jal Add 2 move $s 0, $v 0 move $a 0, $a 2 move $a 1, $a 3 jal Add 2 move $s 1, $v 0 add $v 0, $s 1 jr $ra
Caller-Saved Registers move $a 0, $a 1, $a 2, $a 3, jal Add 4 $s 0 $s 1 $s 2 $s 3 move $t 0, $v 0 move $a 0, $a 1, $a 2, $a 3, jal Add 4 $s 5 $s 6 $s 7 move $t 1, $v 0 add $t 2, $t 0, $t 1
Tag, Index, Block Offset • Recall an address can be decomposed into [tag, index, block offset] • The general rule for determining this decomposition is to start from the right and work to the left • Be careful of word vs. byte addresses
Steps to bits for tag, index, b. o. • Step 1: Determine how many bits for the block offset. If the block size is 2 b bytes, then b bits are required for the block offset • Step 2: Determine how many blocks fit in the cache. (Bytes in cache)/(Bytes in a block). • Step 3: Determine how many rows (unique indices) the cache has. – For direct mapped, rows = number of blocks – For fully associative, rows = 1 – For set associative, rows = (number of blocks)/associativity
Steps to bits for tag, index, b. o. • Step 4: Determine how many bits are needed to represent the index. If there are 2 r rows then you r bits. • Step 5: Tag bits are whatever is left over from Step 1 and Step 4.
Cache Examples • 4 Kbyte, 8 -way associative, cache with 2 words per block – How do you split up the address?
i-Cache and d-Cache • There usually are two separate caches for instructions and data. Why? – Avoids structural hazards in pipelining – The combined cache is twice as big but still has an access time of a small cache – Allows both caches to operate in parallel, for twice the bandwidth
Handling i-Cache Misses 1. Stall the pipeline and send the address of the missed instruction to the memory 2. Instruct memory to perform a read; wait for the access to complete 3. Update the cache 4. Restart the instruction, this time fetching it successfully from the cache d-Cache misses are even easier, but still require a pipeline stall
Cache Replacement • How do you decide which cache block to replace? • If the cache is direct-mapped, it’s easy • Otherwise, common strategies: – Random – Least Recently Used (LRU) – Other strategies are used at lower levels of the hierarchy. More on those later.
LRU Replacement • Replace the block that hasn’t been used for the longest time. Reference stream: ABCDBDEBACBCEDCB
LRU Implementations • LRU is very difficult to implement for high degrees of associativity • 4 -way approximation: – 1 bit to indicate least recently used pair – 1 bit per pair to indicate least recently used item in this pair • Much more complex approximations at lower levels of the hierarchy
The Three C’s of Caches • Three reasons for cache misses: – Compulsory miss: item has never been in the cache – Capacity miss: item has been in the cache, but space was tight and it was forced out (occurs even with fully associative caches) – Conflict miss: item was in the cache, but the cache was not associative enough, so it was forced out (never occurs with fully associative caches)
Eliminating Cache Misses • What cache parameters (cache size, block size, associativity) can you change to eliminate the following kinds of misses – compulsory – capacity – conflict
Multi-Level Caches • Use each level of the memory hierarchy as a cache over the next lowest level • Inserting level 2 between levels 1 and 3 allows: – level 1 to have a higher miss rate (so can be smaller and cheaper) – level 3 to have a larger access time (so can be slower and cheaper) • The new effective access time equation:
Which cache system is better? • 32 KB unified data and instruction cache – hit rate of 97% • 16 KB data cache – hit rate of 92% • And 16 KB instruction cache – hit rate of 98% • Assume – 20% of instructions are loads or stores
Cache Parameters and Tradeoffs • If you are designing a cache, what choices do you have and what are their tradeoffs?
Cache Comparisons Alpha 21164 L 1 8 KB i-Cache direct-mapped 32 B block Alpha 21164 L 1 8 KB d-Cache direct-mapped 32 B block L 2 unified Cache MIPS R 10000 Pentium Pro Ultra. Sparc 1 32 KB 2 -way (LRU) 64 B block 8 KB 4 -way 32 B block 16 KB pseudo 2 -way 32 B block MIPS R 10000 Pentium Pro Ultra. Sparc 1 32 KB 2 -way (LRU) 32 B block 8 KB 2 -way 32 B block 16 KB direct-mapped 32 B block Alpha 21164 Pentium Pro 96 KB 3 -way 64 B block on chip 256 KB 4 -way 32 B block same package
Summary: Classifying Caches • Where can a block be placed? – Direct mapped, Set/Fully associative • How is a block found? – Direct mapped: by index – Set associative: by index and search – Fully associative: by search • What happens on a write access? – Write-back or Write-through • Which block should be replaced? – Random – LRU (Least Recently Used)
- Slides: 21