CSCI365 Computer Organization Lecture 30 Note Some slides

Block Size Considerations • Larger blocks should reduce miss rate – Due to spatial

Block Size Tradeoff Conclusions Miss Rate Exploits Spatial Locality Miss Penalty Fewer blocks: compromises

Cache Misses • On cache hit, CPU proceeds normally • On cache miss –

Types of Cache Misses • Compulsory Misses – When program starts, nothing is loaded

Write-Through • On data-write, could just update the block in cache – But then

Write-Back • Alternative: On data-write, just update the block in cache – Keep track

Example: Intrinsity Fast. MATH • Embedded MIPS processor – 12 -stage pipeline – Instruction

Example: Intrinsity Fast. MATH Fast. Math

Measuring Cache Performance • Components of CPU time – Program execution cycles • Includes

Calculating Cache Performance • Example: Assume an instruction cache miss rate for a program

Slides: 12

Download presentation

Miss Rate vs. Block Size

Block Size Considerations • Larger blocks should reduce miss rate – Due to spatial locality • But in a fixed-sized cache – Larger blocks fewer of them • More competition increased miss rate – Larger blocks pollution • Larger miss penalty – Can override benefit of reduced miss rate – Early restart and critical-word-first can help

Block Size Tradeoff Conclusions Miss Rate Exploits Spatial Locality Miss Penalty Fewer blocks: compromises temporal locality Block Size Average Access Time Block Size Increased Miss Penalty & Miss Rate Block Size

Cache Misses • On cache hit, CPU proceeds normally • On cache miss – Stall the CPU pipeline – Fetch block from next level of hierarchy – Instruction cache miss • Restart instruction fetch – Data cache miss • Complete data access

Types of Cache Misses • Compulsory Misses – When program starts, nothing is loaded • Conflict Misses – Two (or more) needed blocks map to the same cache location – Fixed by Fully Associative Cache • Capacity Misses – Not enough room to hold it all – Can be fixed by bigger cache

Write-Through • On data-write, could just update the block in cache – But then cache and memory would be inconsistent • Write through: also update memory • But makes writes take longer – e. g. , if base CPI = 1, 10% of instructions are stores, write to memory takes 100 cycles • Effective CPI = 1 + 0. 1× 100 = 11 • Solution: write buffer – Holds data waiting to be written to memory – CPU continues immediately • Only stalls on write if write buffer is already full

Write-Back • Alternative: On data-write, just update the block in cache – Keep track of whether each block is dirty • When a dirty block is replaced – Write it back to memory – Can use a write buffer to allow replacing block to be read first

Example: Intrinsity Fast. MATH • Embedded MIPS processor – 12 -stage pipeline – Instruction and data access on each cycle • Split cache: separate I-cache and D-cache – Each 16 KB: 256 blocks × 16 words/block – D-cache: write-through or write-back • SPEC 2000 miss rates – I-cache: 0. 4% – D-cache: 11. 4% – Weighted average: 3. 2%

Example: Intrinsity Fast. MATH Fast. Math

Measuring Cache Performance • Components of CPU time – Program execution cycles • Includes cache hit time – Memory stall cycles • Mainly from cache misses • With simplifying assumptions:

Calculating Cache Performance • Example: Assume an instruction cache miss rate for a program is 2% and a data cache miss rate is 4%. If a processor has a CPI of 2 without any memory stalls and the miss penalty is 100 cycles for all misses, determine how much faster a processor would run with a perfect cache that never missed. Use the instruction frequencies for SPECint 2000 – What happens if processor (but not memory) is made faster? (DONE IN CLASS)