Virtual Memory Address Translation CISC 360 Computer Architecture

  • Slides: 46
Download presentation
Virtual Memory Address Translation CISC 360 – Computer Architecture April 5 th, 2016

Virtual Memory Address Translation CISC 360 – Computer Architecture April 5 th, 2016

Today • Virtual Memory • as a tool for caching • as a tool

Today • Virtual Memory • as a tool for caching • as a tool for memory management • as a tool for memory protection • Address translation • Speedups • Examples / Problems • Case study: Core i 7/Linux memory system

A System Using Physical Addressing CPU Physical address (PA) 4 . . . Main

A System Using Physical Addressing CPU Physical address (PA) 4 . . . Main memory 0: 1: 2: 3: 4: 5: 6: 7: 8: M-1: Data word • Used in “simple” systems like embedded microcontrollers in devices like cars, elevators, and digital picture frames

A System Using Virtual Addressing CPU Chip CPU Virtual address (VA) 4100 MMU Physical

A System Using Virtual Addressing CPU Chip CPU Virtual address (VA) 4100 MMU Physical address (PA) 4 . . . Main memory 0: 1: 2: 3: 4: 5: 6: 7: 8: M-1: Data word • Used in all modern servers, desktops, and laptops • One of the great ideas in computer science

Why Virtual Memory (VM)? • Uses main memory efficiently • Use DRAM as a

Why Virtual Memory (VM)? • Uses main memory efficiently • Use DRAM as a cache for the parts of a virtual address space • Simplifies memory management • Each process gets the same uniform linear address space • Isolates address spaces • One process can’t interfere with another’s memory • User program cannot access privileged kernel information

VM as a Tool for Caching • Virtual memory is an array of N

VM as a Tool for Caching • Virtual memory is an array of N contiguous bytes stored on disk. • The contents of the array on disk are cached in physical memory (DRAM cache) • These cache blocks are called pages (size is P = 2 p bytes) Virtual memory VP 0 Unallocated VP 1 Cached VP 2 n-p-1 Uncached Unallocated Cached Uncached Physical memory 0 0 Empty PP 0 PP 1 Empty M-1 PP 2 m-p-1 N-1 Virtual pages (VPs) stored on disk Physical pages (PPs) cached in DRAM

Page Tables • A page table is an array of page table entries (PTEs)

Page Tables • A page table is an array of page table entries (PTEs) that maps virtual pages to physical pages. • Per-process kernel data structure in DRAM Valid PTE 0 0 1 1 0 0 PTE 7 1 Physical page number or disk address null Physical memory (DRAM) VP 1 VP 2 VP 7 VP 4 Virtual memory (disk) VP 1 Memory resident page table (DRAM) VP 2 VP 3 VP 4 VP 6 VP 7 PP 0 PP 3

Page Hit • Page hit: reference to VM word that is in physical memory

Page Hit • Page hit: reference to VM word that is in physical memory (DRAM cache hit) Virtual address Valid PTE 0 0 1 1 0 0 PTE 7 1 Physical page number or disk address null Physical memory (DRAM) VP 1 VP 2 VP 7 VP 4 Virtual memory (disk) VP 1 Memory resident page table (DRAM) VP 2 VP 3 VP 4 VP 6 VP 7 PP 0 PP 3

Page Fault • Page fault: reference to VM word that is not in physical

Page Fault • Page fault: reference to VM word that is not in physical memory (DRAM cache miss) Virtual address Valid PTE 0 0 1 1 0 0 PTE 7 1 Physical page number or disk address null Physical memory (DRAM) VP 1 VP 2 VP 7 VP 4 Virtual memory (disk) VP 1 Memory resident page table (DRAM) VP 2 VP 3 VP 4 VP 6 VP 7 PP 0 PP 3

Handling Page Fault • Page miss causes page fault (an exception) Virtual address Valid

Handling Page Fault • Page miss causes page fault (an exception) Virtual address Valid PTE 0 0 1 1 0 0 PTE 7 1 Physical page number or disk address null Physical memory (DRAM) VP 1 VP 2 VP 7 VP 4 Virtual memory (disk) VP 1 Memory resident page table (DRAM) VP 2 VP 3 VP 4 VP 6 VP 7 PP 0 PP 3

Handling Page Fault • Page miss causes page fault (an exception) • Page fault

Handling Page Fault • Page miss causes page fault (an exception) • Page fault handler selects a victim to be evicted (here VP 4) Virtual address Valid PTE 0 0 1 1 0 0 PTE 7 1 Physical page number or disk address null Physical memory (DRAM) VP 1 VP 2 VP 7 VP 4 Virtual memory (disk) VP 1 Memory resident page table (DRAM) VP 2 VP 3 VP 4 VP 6 VP 7 PP 0 PP 3

Handling Page Fault • Page miss causes page fault (an exception) • Page fault

Handling Page Fault • Page miss causes page fault (an exception) • Page fault handler selects a victim to be evicted (here VP 4) Virtual address Valid PTE 0 0 1 1 1 0 0 0 PTE 7 1 Physical page number or disk address null Physical memory (DRAM) VP 1 VP 2 VP 7 VP 3 Virtual memory (disk) VP 1 Memory resident page table (DRAM) VP 2 VP 3 VP 4 VP 6 VP 7 PP 0 PP 3

Handling Page Fault • Page miss causes page fault (an exception) • Page fault

Handling Page Fault • Page miss causes page fault (an exception) • Page fault handler selects a victim to be evicted (here VP 4) • Offending instruction is restarted: page hit! Virtual address Valid PTE 0 0 1 1 1 0 0 0 PTE 7 1 Physical page number or disk address null Physical memory (DRAM) VP 1 VP 2 VP 7 VP 3 Virtual memory (disk) VP 1 Memory resident page table (DRAM) VP 2 VP 3 VP 4 VP 6 VP 7 PP 0 PP 3

Locality to the Rescue Again! • Virtual memory works because of locality • At

Locality to the Rescue Again! • Virtual memory works because of locality • At any point in time, programs tend to access a set of active virtual pages called the working set • Programs with better temporal locality will have smaller working sets • If (working set size < main memory size) • Good performance for one process after compulsory misses • If ( SUM(working set sizes) > main memory size ) • Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously

VM as a Tool for Memory Management • Key idea: each process has its

VM as a Tool for Memory Management • Key idea: each process has its own virtual address space • It can view memory as a simple linear array • Mapping function scatters addresses through physical memory • Well chosen mappings simplify memory allocation and management Virtual Address Space for Process 1: 0 VP 1 VP 2 Address translation 0 PP 2 . . . Physical Address Space (DRAM) N-1 PP 6 Virtual Address Space for Process 2: 0 PP 8 VP 1 VP 2 . . . N-1 M-1 (e. g. , read-only library code)

VM as a Tool for Memory Management • Memory allocation • Each virtual page

VM as a Tool for Memory Management • Memory allocation • Each virtual page can be mapped to any physical page • A virtual page can be stored in different physical pages at different times • Sharing code and data among processes • Map virtual pages to the same physical page (here: PP 6) Virtual Address Space for Process 1: 0 VP 1 VP 2 Address translation 0 PP 2 . . . Physical Address Space (DRAM) N-1 PP 6 Virtual Address Space for Process 2: 0 PP 8 VP 1 VP 2 . . . N-1 M-1 (e. g. , read-only library code)

VM as a Tool for Memory Protection • Extend Page Table Entries with permission

VM as a Tool for Memory Protection • Extend Page Table Entries with permission bits • Page fault handler checks these before remapping • If violated, send process SIGSEGV (segmentation fault) Process i: SUP VP 0: VP 1: VP 2: No No Yes READ WRITE Yes Yes No Yes • • • Address PP 6 PP 4 PP 2 Physical Address Space PP 2 PP 4 PP 6 Process j: SUP VP 0: VP 1: VP 2: No Yes No READ WRITE Yes Yes No Yes Address PP 9 PP 6 PP 11 PP 8 PP 9 PP 11

Today • Virtual Memory • as a tool for caching • as a tool

Today • Virtual Memory • as a tool for caching • as a tool for memory management • as a tool for memory protection • Address translation • Speedups • Examples / Problems • Case study: Core i 7/Linux memory system

VM Address Translation • Virtual Address Space • V = {0, 1, …, N–

VM Address Translation • Virtual Address Space • V = {0, 1, …, N– 1} • Physical Address Space • P = {0, 1, …, M– 1} • Address Translation • MAP: V P U { } • For virtual address a: • MAP(a) = a’ if data at virtual address a is at physical address a’ in P • MAP(a) = if data at virtual address a is not in physical memory • Either invalid or stored on disk

Summary of Address Translation Symbols • Basic Parameters • N = 2 n :

Summary of Address Translation Symbols • Basic Parameters • N = 2 n : Number of addresses in virtual address space • M = 2 m : Number of addresses in physical address space • P = 2 p : Page size (bytes) • Components of the virtual address (VA) • • TLBI: TLB index TLBT: TLB tag VPO: Virtual page offset VPN: Virtual page number • Components of the physical address (PA) • • • PPO: Physical page offset (same as VPO) PPN: Physical page number CO: Byte offset within cache line CI: Cache index CT: Cache tag

Address Translation With a Page Table Virtual address n-1 Page table base register (PTBR)

Address Translation With a Page Table Virtual address n-1 Page table base register (PTBR) Page table address for process Virtual page number (VPN) p p-1 0 Virtual page offset (VPO) Page table Valid Physical page number (PPN) Valid bit = 0: page not in memory (page fault) m-1 Physical page number (PPN) Physical address p p-1 Physical page offset (PPO) 0

Address Translation: Page Hit 2 PTEA CPU Chip CPU 1 VA PTE MMU 3

Address Translation: Page Hit 2 PTEA CPU Chip CPU 1 VA PTE MMU 3 PA Cache/ Memory 4 Data 5 1) Processor sends virtual address to Memory Mapping Unit 2 -3) MMU fetches PTE from page table in memory 4) MMU sends physical address to cache/memory 5) Cache/memory sends data word to processor

Address Translation: Page Fault Exception 4 2 PTEA CPU Chip CPU 1 VA 7

Address Translation: Page Fault Exception 4 2 PTEA CPU Chip CPU 1 VA 7 Page fault handler MMU PTE 3 Victim page Cache/ Memory 5 Disk New page 6 1) Processor sends virtual address to MMU 2 -3) MMU fetches PTE from page table in memory 4) Valid bit is zero, so MMU triggers page fault exception 5) Handler identifies victim (and, if dirty, pages it out to disk) 6) Handler pages in new page and updates PTE in memory 7) Handler returns to original process, restarting faulting instruction

Integrating VM and Cache PTE CPU Chip PTEA CPU PTEA hit VA MMU PTEA

Integrating VM and Cache PTE CPU Chip PTEA CPU PTEA hit VA MMU PTEA miss PA PA miss PA Memory Data PA hit Data PTEA L 1 cache VA: virtual address, PA: physical address, PTE: page table entry, PTEA = PTE address

Speeding up Translation with a TLB • Page table entries (PTEs) are cached in

Speeding up Translation with a TLB • Page table entries (PTEs) are cached in L 1 like any other memory word • PTEs may be evicted by other data references • PTE hit still requires a small L 1 delay • Solution: Translation Lookaside Buffer (TLB) • Small hardware cache in MMU • Maps virtual page numbers to physical page numbers • Contains complete page table entries for small number of pages

TLB Hit CPU Chip CPU TLB 2 PTE VPN 3 1 VA MMU Data

TLB Hit CPU Chip CPU TLB 2 PTE VPN 3 1 VA MMU Data 5 A TLB hit eliminates a memory access PA 4 Cache/ Memory

TLB Miss CPU Chip TLB 2 4 PTE VPN CPU 1 VA MMU 3

TLB Miss CPU Chip TLB 2 4 PTE VPN CPU 1 VA MMU 3 PTEA PA Cache/ Memory 5 Data 6 A TLB miss incurs an additional memory access (the PTE) Fortunately, TLB misses are rare. Why?

Multi-Level Page Tables Level 2 Tables • Suppose: • 4 KB (212) page size,

Multi-Level Page Tables Level 2 Tables • Suppose: • 4 KB (212) page size, 48 -bit address space, 8 -byte PTE • Problem: • Would need a 512 GB page table! • 248 * 2 -12 * 23 = 239 bytes Level 1 Table. . . • Common solution: • Multi-level page tables • Example: 2 -level page table • Level 1 table: each PTE points to a page table (always memory resident) • Level 2 table: each PTE points to a page (paged in and out like any other data) . . .

Simple Memory System Examples

Simple Memory System Examples

Simple Memory System Example • Addressing • 14 -bit virtual addresses • 12 -bit

Simple Memory System Example • Addressing • 14 -bit virtual addresses • 12 -bit physical address • Page size = 64 bytes 13 12 11 10 9 8 7 6 5 4 3 2 1 VPN VPO Virtual Page Number Virtual Page Offset 11 10 9 8 7 6 5 4 3 2 1 PPN PPO Physical Page Number Physical Page Offset 0 0

Simple Memory System Page Table Only show first 16 entries (out of 256) VPN

Simple Memory System Page Table Only show first 16 entries (out of 256) VPN PPN Valid 00 28 1 08 13 1 01 – 0 09 17 1 02 33 1 0 A 09 1 03 02 1 0 B – 0 04 – 0 0 C – 0 05 16 1 0 D 2 D 1 06 – 0 0 E 11 1 07 – 0 0 F 0 D 1

Simple Memory System TLB • 16 entries • 4 -way associative TLBT 13 12

Simple Memory System TLB • 16 entries • 4 -way associative TLBT 13 12 11 10 TLBI 9 8 7 6 5 4 3 VPN 2 1 0 VPO Set Tag PPN Valid 0 03 – 0 09 0 D 1 00 – 0 07 02 1 1 03 2 D 1 02 – 0 04 – 0 0 A – 0 2 02 – 0 08 – 0 06 – 0 03 – 0 3 07 – 0 03 0 D 1 0 A 34 1 02 – 0

Simple Memory System Cache • 16 lines, 4 -byte block size • Physically addressed

Simple Memory System Cache • 16 lines, 4 -byte block size • Physically addressed • Direct mapped CT 11 10 9 CI 8 7 6 5 4 CO 3 PPN 2 1 0 PPO Idx Tag Valid B 0 B 1 B 2 B 3 0 19 1 99 11 23 11 8 24 1 3 A 00 51 89 1 15 0 – – 9 2 D 0 – – 2 1 B 1 00 02 04 08 A 2 D 1 93 15 DA 3 B 3 36 0 – – B 0 B 0 – – 4 32 1 43 6 D 8 F 09 C 12 0 – – 5 0 D 1 36 72 F 0 1 D D 16 1 04 96 34 15 6 31 0 – – E 13 1 83 77 1 B D 3 7 16 1 11 C 2 DF 03 F 14 0 – –

Address Translation Example #1 Virtual Address: 0 x 03 D 4 TLBT TLBI 13

Address Translation Example #1 Virtual Address: 0 x 03 D 4 TLBT TLBI 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 1 1 0 1 0 0 VPN 0 x 0 F ___ 0 x 3 TLBI ___ VPO 0 x 03 TLBT ____ Y TLB Hit? __ N Page Fault? __ PPN: 0 x 0 D ____ Physical Address CI CT 11 10 9 8 7 6 5 4 3 2 1 0 0 0 1 1 0 1 0 0 PPN 0 CO ___ CO 0 x 5 CI___ 0 x 0 D CT ____ PPO Y Hit? __ 0 x 36 Byte: ____

Address Translation Example #2 Virtual Address: 0 x 0 B 8 F TLBT TLBI

Address Translation Example #2 Virtual Address: 0 x 0 B 8 F TLBT TLBI 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 1 1 VPN 0 x 2 E ___ 0 x 2 TLBI ___ VPO 0 x 0 B TLBT ____ N TLB Hit? __ Y Page Fault? __ TBD PPN: ____ Physical Address CI CT 11 10 9 8 7 6 PPN CO ___ CI___ CT ____ 5 4 CO 3 PPO Hit? __ Byte: ____ 2 1 0

Address Translation Example #3 Virtual Address: 0 x 0020 TLBT TLBI 13 12 11

Address Translation Example #3 Virtual Address: 0 x 0020 TLBT TLBI 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 0 1 0 0 0 VPN 0 x 00 ___ 0 TLBI ___ VPO 0 x 00 TLBT ____ N TLB Hit? __ N Page Fault? __ PPN: 0 x 28 ____ Physical Address CI CT 11 10 9 8 7 6 5 4 3 2 1 0 1 0 0 0 0 0 PPN 0 CO___ CO 0 x 8 CI___ 0 x 28 CT ____ PPO N Hit? __ Mem Byte: ____

Intel Core i 7 Memory System Core x 4 Registers Instruction fetch L 1

Intel Core i 7 Memory System Core x 4 Registers Instruction fetch L 1 d-cache 32 KB, 8 -way L 1 i-cache 32 KB, 8 -way L 2 unified cache 256 KB, 8 -way MMU (addr translation) L 1 d-TLB 64 entries, 4 -way L 1 i-TLB 128 entries, 4 -way L 2 unified TLB 512 entries, 4 -way Quick. Path interconnect 4 links @ 25. 6 GB/s each L 3 unified cache 8 MB, 16 -way (shared by all cores) DDR 3 Memory controller 3 x 64 bit @ 10. 66 GB/s 32 GB/s total (shared by all cores) Main memory To other cores To I/O bridge

Review of Symbols • Basic Parameters • N = 2 n : Number of

Review of Symbols • Basic Parameters • N = 2 n : Number of addresses in virtual address space • M = 2 m : Number of addresses in physical address space • P = 2 p : Page size (bytes) • Components of the virtual address (VA) • • TLBI: TLB index TLBT: TLB tag VPO: Virtual page offset VPN: Virtual page number • Components of the physical address (PA) • • • PPO: Physical page offset (same as VPO) PPN: Physical page number CO: Byte offset within cache line CI: Cache index CT: Cache tag

End-to-end Core i 7 Address Translation 32/64 CPU L 2, L 3, and main

End-to-end Core i 7 Address Translation 32/64 CPU L 2, L 3, and main memory Result Virtual address (VA) 36 12 VPN VPO 32 L 1 miss L 1 hit 4 TLBT TLBI L 1 d-cache (64 sets, 8 lines/set) TLB hit . . . TLB miss L 1 TLB (16 sets, 4 entries/set) 9 9 40 VPN 1 VPN 2 VPN 3 VPN 4 PPN CR 3 PTE PTE Page tables PTE 12 40 6 PPO CT CI CO Physical address (PA) 6

Core i 7 Level 1 -3 Page Table Entries 63 62 XD 52 51

Core i 7 Level 1 -3 Page Table Entries 63 62 XD 52 51 Unused 12 11 Page table physical base address 9 Unused 8 7 G PS 6 5 A 4 3 2 1 0 CD WT U/S R/W P=1 Available for OS (page table location on disk) Each entry references a 4 K child page table P: Child page table present in physical memory (1) or not (0). R/W: Read-only or read-write access permission for all reachable pages. U/S: user or supervisor (kernel) mode access permission for all reachable pages. WT: Write-through or write-back cache policy for the child page table. CD: Caching disabled or enabled for the child page table. A: Reference bit (set by MMU on reads and writes, cleared by software). PS: Page size either 4 KB or 4 MB (defined for Level 1 PTEs only). G: Global page (don’t evict from TLB on task switch) Page table physical base address: 40 most significant bits of physical page table address (forces page tables to be 4 KB aligned) P=0

Core i 7 Level 4 Page Table Entries 63 62 XD 52 51 Unused

Core i 7 Level 4 Page Table Entries 63 62 XD 52 51 Unused 12 11 Page physical base address 9 Unused 8 G 7 6 5 D A Available for OS (page location on disk) Each entry references a 4 K child page P: Child page is present in memory (1) or not (0) R/W: Read-only or read-write access permission for child page U/S: User or supervisor mode access WT: Write-through or write-back cache policy for this page CD: Cache disabled (1) or enabled (0) A: Reference bit (set by MMU on reads and writes, cleared by software) D: Dirty bit (set by MMU on writes, cleared by software) G: Global page (don’t evict from TLB on task switch) Page physical base address: 40 most significant bits of physical page address (forces pages to be 4 KB aligned) 4 3 2 1 0 CD WT U/S R/W P=1 P=0

Core i 7 Page Table Translation 9 9 VPN 1 CR 3 Physical address

Core i 7 Page Table Translation 9 9 VPN 1 CR 3 Physical address of L 1 PT 40 / L 1 PT Page global directory L 1 PTE 512 GB region per entry 9 VPN 2 L 2 PT Page upper 40 directory / VPN 3 L 3 PT Page middle 40 directory / L 2 PTE 9 VPN 4 2 MB region per entry VPO Virtual address L 4 PT Page table 40 / Offset into /12 physical and virtual page L 4 PTE L 3 PTE 1 GB region per entry 12 4 KB region per entry Physical address of page 40 / 40 12 PPN PPO Physical address

Cute Trick for Speeding Up L 1 Access CT Physical address (PA) Virtual address

Cute Trick for Speeding Up L 1 Access CT Physical address (PA) Virtual address (VA) • Observation • • • 36 CT 6 6 CI CO PPN PPO Tag Check No Change Address Translation CI VPN VPO 36 12 L 1 Cache Bits that determine CI identical in virtual and physical address Can index into cache while address translation taking place Generally we hit in TLB, so PPN bits (CT bits) available next “Virtually indexed, physically tagged” Cache carefully sized to make this possible

Virtual Memory of a Linux Process-specific data structs (ptables, task and mm structs, kernel

Virtual Memory of a Linux Process-specific data structs (ptables, task and mm structs, kernel stack) Different for each process Physical memory Identical for each process Kernel virtual memory Kernel code and data User stack %esp Memory mapped region for shared libraries brk Runtime heap (malloc) Uninitialized data (. bss) Initialized data (. data) Program text (. text) 0 x 08048000 (32) 0 x 00400000 (64) 0 Process virtual memory

Linux Organizes VM as Collection of “Areas” task_struct mm vm_area_struct mm_struct pgd mmap vm_end

Linux Organizes VM as Collection of “Areas” task_struct mm vm_area_struct mm_struct pgd mmap vm_end vm_start vm_prot vm_flags vm_next • pgd: • Page global directory address • Points to L 1 page table • vm_prot: • Read/write permissions for this area • vm_flags • Pages shared with other processes or private to this process Process virtual memory vm_end vm_start vm_prot vm_flags Shared libraries Data vm_next Text vm_end vm_start vm_prot vm_flags vm_next 0

Linux Page Fault Handling vm_area_struct Process virtual memory vm_end vm_start vm_prot vm_flags vm_next vm_end

Linux Page Fault Handling vm_area_struct Process virtual memory vm_end vm_start vm_prot vm_flags vm_next vm_end vm_start vm_prot vm_flags shared libraries 1 read data 3 read Segmentation fault: accessing a non-existing page Normal page fault vm_next text vm_end vm_start vm_prot vm_flags vm_next 2 write Protection exception: e. g. , violating permission by writing to a read-only page (Linux reports as Segmentation fault)