Lecture 7 TLB Virtual Memory Approaches Time Sharing
Lecture 7 TLB
Virtual Memory Approaches • Time Sharing • Static Relocation • Base+Bounds • Segmentation • Paging
Basic Paging • Flexible Addr Space • don’t need to find contiguous RAM • doesn’t waste whole data pages (valid bit) • Easy to manage • fixed size pages • simple free list for unused pages • no need to coalesce • Too slow • Too big
Page Mapping with Linear Page Table P 1 P 3 P 2 Virt. Mem Phys. Mem 0 1 2 3 4 5 6 7 8 9 10 11 Page Tables P 1 3 1 7 10 P 2 0 4 2 6 P 3 8 5 9 11
Where are Page Tables Stored? • The size of a typical page table? • • assume 32 -bit address space assume 4 KB pages assume 4 byte entries (or this could be less) 2 ^ (32 - log(4 KB)) * 4 = 4 MB • Store in memory, and CPU finds it via registers
Memory Accesses 0 x 0010 movl 0 x 1100, %r 8 d 0 x 0014 addl $0 x 3, %r 8 d 0 x 0017 movl %r 8 d, 0 x 1100 PT 2 0 80 99 PT, load from 0 x 5000 Fetch instruction at 0 x 2010 PT, load from 0 x 5004 Exec, load from 0 x 0100 … Assume 4 KB pages Assume PTBR is 0 x 5000 Assume PTE’s are 4 bytes TOO SLOW
Other Information in Page Table • What other data should go in page table entries besides translation? • • • valid bit protection bits present bit reference bit dirty bit
// Extract the VPN from the virtual address VPN = (Virtual. Address & VPN_MASK) >> SHIFT // Form the address of the page-table entry (PTE) PTEAddr = PTBR + (VPN * sizeof(PTE)) // Fetch the PTE = Access. Memory(PTEAddr) // Check if process can access the page if (PTE. Valid == False) Raise. Exception(SEGMENTATION_FAULT) else if (Can. Access(PTE. Protect. Bits) == False) Raise. Exception(PROTECTION_FAULT) else // Access is OK: form physical address and fetch it offset = Virtual. Address & OFFSET_MASK Phys. Addr = (PTE. PFN << PFN_SHIFT) | offset Register = Access. Memory(Phys. Addr)
Translation Steps H/W: for each mem reference: 1. extract VPN (virt page num) from VA (virt addr) 2. calculate addr of PTE (page table entry) 3. fetch PTE 4. extract PFN (page frame num) 5. build PA (phys addr) 6. fetch PA to register
A Memory Trace int array[1000]; . . . for (i = 0; i < 1000; i++) array[i] = 0; 0 x 1024 0 x 1028 0 x 102 c 0 x 1030 movl $0 x 0, (%edi, %eax, 4) incl %eax cmpl $0 x 03 e 8, %eax jne 0 x 1024
Array Iterator int sum = 0; for (i = 0; i < 10; i++) { sum += a[i]; } • What is the memory trace?
Basic strategy • Take advantage of repetition. • Use a CPU cache. CPU RAM TLB PT
TLB Cache Type • Fully-Associative: entries can go anywhere • most common for TLBs • must store whole key/value in cache • search all in parallel • There are other general cache types
TLB Contents VPN | PFN | other bits • TLB valid bit • whether the entry has a valid translation • TLB protection bits • rwx • Address Space Identifier • TLB dirty bit
A MIPS TLB Entry
1 VPN = (Virtual. Address & VPN_MASK) >> SHIFT 2 (Success, Tlb. Entry) = TLB_Lookup(VPN) 3 if (Success == True) // TLB Hit 4 if (Can. Access(Tlb. Entry. Protect. Bits) == True) 5 Offset = Virtual. Address & OFFSET_MASK 6 Phys. Addr = (Tlb. Entry. PFN << SHIFT) | Offset 7 Access. Memory(Phys. Addr) 8 else 9 Raise. Exception(PROTECTION_FAULT) 10 else // TLB Miss 11 PTEAddr = PTBR + (VPN * sizeof(PTE)) 12 PTE = Access. Memory(PTEAddr) 13 if (PTE. Valid == False) 14 Raise. Exception(SEGMENTATION_FAULT) 15 else if (Can. Access(PTE. Protect. Bits) == False) 16 Raise. Exception(PROTECTION_FAULT) 17 else 18 TLB_Insert(VPN, PTE. PFN, PTE. Protect. Bits) 19 Retry. Instruction()
Array Iterator with TLB int sum = 0; for (i = 0; i < 10; i++) { sum += a[i]; } How many TLB hits? How many TLB misses? Hit rate? Miss rate?
Reasoning about TLB • Workload: series of loads/stores to accesses • TLB: chooses entries to store in CPU • Metric: performance (i. e. , hit rate)
TLB Workloads • Spatial locality • Sequential array accesses can almost always hit in the TLB, and so are very fast! • Temporal locality • What pattern would be slow? • highly random, with no repeat accesses
TLB Replacement Policies • LRU: evict least-recently used a TLB slot is needed • Random: randomly choose entries to evict • When is each better? • Sometimes random is better than a “smart” policy!
Who Handles The TLB Miss? • H/W or OS? • H/W: CPU must know where page tables are • CR 3 on x 86 • Page table structure not flexible • OS: CPU traps into OS upon TLB miss
1 VPN = (Virtual. Address & VPN_MASK) >> SHIFT 2 (Success, Tlb. Entry) = TLB_Lookup(VPN) 3 if (Success == True) // TLB Hit 4 if (Can. Access(Tlb. Entry. Protect. Bits) == True) 5 Offset = Virtual. Address & OFFSET_MASK 6 Phys. Addr = (Tlb. Entry. PFN << SHIFT) | Offset 7 Access. Memory(Phys. Addr) 8 else 9 Raise. Exception(PROTECTION_FAULT) 10 else // TLB Miss 11 PTEAddr = PTBR + (VPN * sizeof(PTE)) 12 PTE = Access. Memory(PTEAddr) 13 if (PTE. Valid == False) 14 Raise. Exception(SEGMENTATION_FAULT) 15 else if (Can. Access(PTE. Protect. Bits) == False) 16 Raise. Exception(PROTECTION_FAULT) 17 else 18 TLB_Insert(VPN, PTE. PFN, PTE. Protect. Bits) 19 Retry. Instruction()
1 VPN = (Virtual. Address & VPN_MASK) >> SHIFT 2 (Success, Tlb. Entry) = TLB_Lookup(VPN) 3 if (Success == True) // TLB Hit 4 if (Can. Access(Tlb. Entry. Protect. Bits) == True) 5 Offset = Virtual. Address & OFFSET_MASK 6 Phys. Addr = (Tlb. Entry. PFN << SHIFT) | Offset 7 Access. Memory(Phys. Addr) 8 else 9 Raise. Exception(PROTECTION_FAULT) 10 else // TLB Miss 11 Raise. Exception(TLB_MISS)
OS TLB Miss Handler • OS: CPU traps into OS upon TLB miss 1. check page table for page table entry 2. if valid, extract PFN and update TLB w special inst 3. return from trap • Where to resume execution? • The instruction that caused the trap • How to avoid double traps? • keep TLB miss handlers in physical memory • reserve some entries in the TLB for permanently-valid translations • Modifying TLB entries is privileged
Context Switches • What happens if a process uses the cached TLB entries from another process? • Solutions? • Flush TLB on each switch • Remember which entries are for each process • Address Space Identifier
Address Space Identifier • You can think of the ASID as a process identifier (PID), but usually it has fewer bits P 1 (ASID 11) 3 1 7 10 P 2 (ASID 12) 0 4 2 6 valid VPN PFN ASID 0 - - - 1 1 1 ? 1 1 4 ? 1 0 3 ?
Next time: solving the too big problems
- Slides: 28