CSCE 430830 Computer Architecture Review of Memory Hierarchy

  • Slides: 23
Download presentation
CSCE 430/830 Computer Architecture Review of Memory Hierarchy & Storage Lecturer: Prof. Hong Jiang

CSCE 430/830 Computer Architecture Review of Memory Hierarchy & Storage Lecturer: Prof. Hong Jiang Fall, 2008 CSCE 430/830 Portions of these slides are derived from: Dave Patterson © UCB Review of Mem. Hierarchy

The Principle of Locality • The Principle of Locality: – Program access a relatively

The Principle of Locality • The Principle of Locality: – Program access a relatively small portion of the address space at any instant of time. • Two Different Types of Locality: – Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e. g. , loops, reuse) – Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e. g. , straightline code, array access) • Last 15 years, HW relied on locality for speed It is a property of programs which is exploited in machine design. CSCE 430/830 Review of Mem. Hierarchy

Memory Hierarchy - the Big Picture • Problem: memory is too slow and too

Memory Hierarchy - the Big Picture • Problem: memory is too slow and too small • Solution: memory hierarchy Processor Control Size (bytes): CSCE 430/830 L 1 On-Chip Cache Speed (ns): Registers Datapath 0. 25 -0. 5 <1 K L 2 Off-Chip Cache 0. 5 -25 <16 M Main Memory (DRAM) Secondary Storage (Disk) 80 -250 5, 000 (5 ms) <16 G >100 G Review of Mem. Hierarchy

Fundamental Cache Questions • Q 1: Where can a block be placed in the

Fundamental Cache Questions • Q 1: Where can a block be placed in the upper level? (Block placement) • Q 2: How is a block found if it is in the upper level? (Block identification) • Q 3: Which block should be replaced on a miss? (Block replacement) • Q 4: What happens on a write? (Write strategy) CSCE 430/830 Review of Mem. Hierarchy

Q 1: Where can a block be placed in the upper level? • Block

Q 1: Where can a block be placed in the upper level? • Block 12 placed in 8 block cache: – Fully associative, direct mapped, 2 -way set associative – S. A. Mapping = (Block Number) Modulo (Number Sets) 2 -Way Assoc Direct Mapped Full Mapped (12 mod 4) = 0 (12 mod 8) = 4 01234567 Cache 111112222233 0123456789012345678901 Memory CSCE 430/830 Review of Mem. Hierarchy

Q 2: How is a block found if it is in the upper level?

Q 2: How is a block found if it is in the upper level? • Tag on each block – No need to check index or block offset • Increasing associativity shrinks index, expands tag Block Address Tag CSCE 430/830 Index Block Offset Review of Mem. Hierarchy

Q 3: Which block should be replaced on a miss? • Easy for Direct

Q 3: Which block should be replaced on a miss? • Easy for Direct Mapped • Set Associative or Fully Associative: – Random – LRU (Least Recently Used) Assoc: Size 16 KB 64 KB 256 KB CSCE 430/830 2 -way LRU Ran 5. 2% 5. 7% 1. 9% 2. 0% 1. 15% 1. 17% 4 -way LRU Ran 4. 7% 5. 3% 1. 5% 1. 7% 1. 13% 8 -way LRU Ran 4. 4% 5. 0% 1. 4% 1. 5% 1. 12% Review of Mem. Hierarchy

Q 4: What happens on a write? Write-Through Policy Data written to cache block

Q 4: What happens on a write? Write-Through Policy Data written to cache block Write-Back Write data only to the cache also written to lowerlevel memory Update lower level when a block falls out of the cache Debug Easy Hard Do read misses produce writes? No Yes Do repeated writes make it to lower level? Yes No Additional option (on miss)-- let writes to an un-cached address allocate a new cache line (“write-allocate”). CSCE 430/830 Review of Mem. Hierarchy

Set Associative Cache Design • Key idea: – Divide cache into sets – Allow

Set Associative Cache Design • Key idea: – Divide cache into sets – Allow block anywhere in a set • Advantages: – Better hit rate • Disadvantage: – More tag bits – More hardware – Higher access time A Four-Way Set-Associative Cache CSCE 430/830 Review of Mem. Hierarchy

Cache Performance Measures • Hit rate: fraction found in the cache – So high

Cache Performance Measures • Hit rate: fraction found in the cache – So high that we usually talk about Miss rate = 1 - Hit Rate • Hit time: time to access the cache • Miss penalty: time to replace a block from lower level, including time to replace in CPU – access time: time to acccess lower level – transfer time: time to transfer block • Average memory-access time (AMAT) = Hit time + Miss rate x Miss penalty (ns or clocks) CSCE 430/830 Review of Mem. Hierarchy

Cache performance • Miss-oriented Approach to Memory Access: – CPIExecution includes ALU and Memory

Cache performance • Miss-oriented Approach to Memory Access: – CPIExecution includes ALU and Memory instructions • Separating out Memory component entirely – AMAT = Average Memory Access Time – CPIALUOps does not include memory instructions CSCE 430/830 Review of Mem. Hierarchy

Details of Page Table Physical Memory Space Virtual Address 12 offset frame V page

Details of Page Table Physical Memory Space Virtual Address 12 offset frame V page no. frame virtual address Page Table Base Reg index into page table Page Table V Access Rights PA table located in physical P page no. memory offset 12 Physical Address • Page table maps virtual page numbers to physical frames (“PTE” = Page Table Entry) • Virtual memory => treat memory cache for disk CSCE 430/830 Review of Mem. Hierarchy

Page tables may not fit in memory! A table for 4 KB pages for

Page tables may not fit in memory! A table for 4 KB pages for a 32 -bit address space has 1 M entries Each process needs its own address space! Two-level Page Tables 32 bit virtual address 31 22 21 12 11 0 P 1 index P 2 index Page Offset Top-level table wired in main memory Subset of 1024 second-level tables in main memory; rest are on disk or unallocated CSCE 430/830 Review of Mem. Hierarchy

The TLB caches page table entries Physical and virtual pages must be the same

The TLB caches page table entries Physical and virtual pages must be the same size! TLB caches page table entries. virtual address page Physical frame address for ASID off Page Table 2 0 1 3 TLB frame page 2 2 0 5 CSCE 430/830 physical address page off MIPS handles TLB misses in software (random replacement). Other machines use hardware. V=0 pages either reside on disk or have not yet been allocated. OS handles V=0 Review of Mem. Hierarchy “Page fault”

Virtually Indexed, Physically Tagged Cache What motivation? • Fast cache hit by parallel TLB

Virtually Indexed, Physically Tagged Cache What motivation? • Fast cache hit by parallel TLB access • No virtual cache shortcomings How could it be correct? • Require cache way size <= page size; now physical index is from page offset • Then virtual and physical indices are identical ⇒ works like a physically indexed cache! CSCE 430/830 Review of Mem. Hierarchy

Virtually Indexed, Physically Tagged Cache 28 CSCE 430/830 Review of Mem. Hierarchy

Virtually Indexed, Physically Tagged Cache 28 CSCE 430/830 Review of Mem. Hierarchy

Summary #1/3: The Cache Design Space • Several interacting dimensions – – – Cache

Summary #1/3: The Cache Design Space • Several interacting dimensions – – – Cache Size cache size block size associativity replacement policy write-through vs write-back write allocation Associativity • The optimal choice is a compromise – depends on access characteristics » workload » use (I-cache, D-cache, TLB) – depends on technology / cost • Simplicity often wins CSCE 430/830 Block Size Bad Good Factor A Less Factor B More Review of Mem. Hierarchy

Summary #2/3: Caches • The Principle of Locality: – Program access a relatively small

Summary #2/3: Caches • The Principle of Locality: – Program access a relatively small portion of the address space at any instant of time. » Temporal Locality: Locality in Time » Spatial Locality: Locality in Space • Three Major Categories of Cache Misses: – Compulsory Misses: sad facts of life. Example: cold start misses. – Capacity Misses: increase cache size – Conflict Misses: increase cache size and/or associativity. Nightmare Scenario: ping pong effect! • Write Policy: Write Through vs. Write Back • Today CPU time is a function of (ops, cache misses) vs. just f(ops): affects Compilers, Data structures, and Algorithms CSCE 430/830 Review of Mem. Hierarchy

Summary #3/3: TLB, Virtual Memory • Page tables map virtual address to physical address

Summary #3/3: TLB, Virtual Memory • Page tables map virtual address to physical address • TLBs are important for fast translation • TLB misses are significant in processor performance – funny times, as most systems can’t access all of 2 nd level cache without TLB misses! • Caches, TLBs, Virtual Memory all understood by examining how they deal with 4 questions: 1) Where can block be placed? 2) How is block found? 3) What block is replaced on miss? 4) How are writes handled? • Today VM allows many processes to share single memory without having to swap all processes to disk; today VM protection is more important than memory hierarchy benefits, but computers insecure • Prepare for debate + quiz on Wednesday CSCE 430/830 Review of Mem. Hierarchy

Summary of Virtual Machine Monitor • Virtual Machine Revival – Overcome security flaws of

Summary of Virtual Machine Monitor • Virtual Machine Revival – Overcome security flaws of modern OSes – Processor performance no longer highest priority – Manage Software, Manage Hardware • “… VMMs give OS developers another opportunity to develop functionality no longer practical in today’s complex and ossified operating systems, where innovation moves at geologic pace. ” [Rosenblum and Garfinkel, 2005] • Virtualization challenges for processor, virtual memory, I/O – Paravirtualization, ISA upgrades to cope with those difficulties • Xen as example VMM using paravirtualization – 2005 performance on non-I/O bound, I/O intensive apps: 80% of native Linux without driver VM, 34% with driver VM • Opteron memory hierarchy still critical to performance CSCE 430/830 Review of Mem. Hierarchy

Disk Device Performance Outer Track Platter Inner Sector Head Arm Controller Spindle Track Actuator

Disk Device Performance Outer Track Platter Inner Sector Head Arm Controller Spindle Track Actuator • Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead • Seek Time? depends no. tracks move arm, seek speed of disk • Rotation Time? depends on speed disk rotates, how far sector is from head • Transfer Time? depends on data rate (bandwidth) of disk (bit density), size of request CSCE 430/830 Review of Mem. Hierarchy

Redundant Arrays of (Inexpensive) Disks • Files are "striped" across multiple disks • Redundancy

Redundant Arrays of (Inexpensive) Disks • Files are "striped" across multiple disks • Redundancy yields high data availability – Availability: service still provided to user, even if some components failed • Disks will still fail • Contents reconstructed from data redundantly stored in the array Capacity penalty to store redundant info Bandwidth penalty to update redundant info CSCE 430/830 Review of Mem. Hierarchy

Summary: RAID Techniques: Goal was performance, popularity due to reliability of storage • Disk

Summary: RAID Techniques: Goal was performance, popularity due to reliability of storage • Disk Mirroring, Shadowing (RAID 1) Each disk is fully duplicated onto its "shadow" Logical write = two physical writes 100% capacity overhead • Parity Data Bandwidth Array (RAID 3) Parity computed horizontally Logically a single high data bw disk • High I/O Rate Parity Array (RAID 5) 1 0 0 1 1 1 0 0 1 1 0 0 1 0 Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes CSCE 430/830 Review of Mem. Hierarchy