VM Design Issues Vivek Pai Kai Li Princeton

  • Slides: 32
Download presentation
VM Design Issues Vivek Pai / Kai Li Princeton University

VM Design Issues Vivek Pai / Kai Li Princeton University

Mini-Gedankenexperimenten n n What’s the refresh rate of your monitor? What is the access

Mini-Gedankenexperimenten n n What’s the refresh rate of your monitor? What is the access time of a hard drive? What response time determines sluggishness or speediness? What’s the relation? What determines the running speed of a program that’s paging heavily? If you have a program that pages heavily, what are your options to improve the situation? 2

Mechanics Let’s finish off last lecture n Memory mapping, Unified VM next time n

Mechanics Let’s finish off last lecture n Memory mapping, Unified VM next time n n n Mid-term on track n n No assigned reading yet, may not exist Covers everything before it Open Q&A session? n n Is there interest? If so, when? 3

Where We Left Off Last Time Various approaches to evicting pages n Some discussion

Where We Left Off Last Time Various approaches to evicting pages n Some discussion about why doing even “well” is hard to implement n Belady’s algorithm for off-line analysis n We just finished variations on FIFO n n In particular, enhanced FIFO with 2 nd chance 4

Lessons From Enhanced FIFO Observation: it’s easier to evict a clean page than a

Lessons From Enhanced FIFO Observation: it’s easier to evict a clean page than a dirty page n 2 nd observation: sometimes the disk and CPU are idle n Optimization: when system’s free, write dirty pages back to disk, but don’t evict n Called flushing – often falls to pager daemon n 5

Least Recently Used (LRU) n Algorithm n n Replace page that hasn’t been used

Least Recently Used (LRU) n Algorithm n n Replace page that hasn’t been used for the longest time Question n What hardware mechanisms required to implement LRU? 6

Implementing LRU Mostly recently used n 5 3 4 7 9 11 2 1

Implementing LRU Mostly recently used n 5 3 4 7 9 11 2 1 15 Least recently used Perfect n n Use a timestamp on each reference Keep a list of pages ordered by time of reference 7

Approximate LRU Most recently used Least recently used LRU N categories pages in order

Approximate LRU Most recently used Least recently used LRU N categories pages in order of last reference Crude LRU 8 -bit count 2 categories pages referenced since the last page fault pages not referenced since the last page fault 0 . . . 1 2 3 254 255 256 categories 8

Aging: Not Frequently Used (NFU) n n n 01000000 10100000 01010000000 00000000 11100000 01110000000

Aging: Not Frequently Used (NFU) n n n 01000000 10100000 01010000000 00000000 11100000 01110000000 00111000 01000000 Shift reference bits into counters Pick the page with the smallest counter NFU has a short history (counter length) How many bits are enough? n n 10000000 01000000 Main difference between NFU and LRU? n n 0000 10000000 Algorithm n n 00000000 In practice 8 bits are quite good Pros: Require one reference bit Cons: Require looking at all counters 9

Where Do We Get Storage? n 32 bit VA to 32 bit PA –

Where Do We Get Storage? n 32 bit VA to 32 bit PA – no space, right? n n No need to store offset n n n Offset within page is the same 4 KB page = 12 bits of offset Those 12 bits are “free” in PTE Page # + other info <= 32 bits n Makes storing info easy 10

x 86 Page Table Entry Page frame number. U P Cw. Gl L D

x 86 Page Table Entry Page frame number. U P Cw. Gl L D A Cd. Wt O W V 31 12 Reserved Valid Writable Owner (user/kernel) Write-through Cache disabled Accessed (referenced) Dirty PDE maps 4 MB Global 11

What Happens on Diagonal Lines n My screen is 1024*768 pixels n n 256

What Happens on Diagonal Lines n My screen is 1024*768 pixels n n 256 colors = 1 byte per pixel =. 75 MB 64 K colors = 2 bytes/pixel = 1. 5 MB Page size is 4 KB Screen is 192 or 384 pages 1 page = several horizontal lines n Diagonal/vertical lines = TLB badness n “Superpages” to the rescue n 12

The Big Picture We’ve talked about single evictions n Most computers are multiprogrammed n

The Big Picture We’ve talked about single evictions n Most computers are multiprogrammed n n n Single eviction decision still needed New concern – allocating resources How to be “fair enough” and achieve good overall throughput This is a competitive world – local and global resource allocation decisions 13

Program Behaviors 80/20 rule n n Locality n n > 80% memory references are

Program Behaviors 80/20 rule n n Locality n n > 80% memory references are made by < 20% of code Spatial and temporal # page faults n Working set n Keep a set of pages in memory would avoid a lot of page faults # pages in memory 14

Observations re Working Set Working set isn’t static n There often isn’t a single

Observations re Working Set Working set isn’t static n There often isn’t a single “working set” n n Multiple plateaus in previous curve Program coding style affects working set Working set is hard to gauge n What’s the working set of an interactive program? 15

Working Set n Main idea n n Keep the working set in memory An

Working Set n Main idea n n Keep the working set in memory An algorithm n n n On a page fault, scan through all pages of the process If the reference bit is 1, record the current time for the page If the reference bit is 0, check the “last use time” n n n If the page has not been used within d, replace the page Otherwise, go to the next Add the faulting page to the working set 16

WSClock Paging Algorithm n n n Follow the clock hand If the reference bit

WSClock Paging Algorithm n n n Follow the clock hand If the reference bit is 1, set reference bit to 0, set the current time for the page and go to the next If the reference bit is 0, check “last use time” n n If page has been used within d, go to the next If page hasn’t been used within d and modify bit is 1 n n Schedule the page for page out and go to the next If page hasn’t been used within d and modified bit is 0 n Replace this page 17

Simulating Modify Bit with Access Bits n n n Set pages read-only if they

Simulating Modify Bit with Access Bits n n n Set pages read-only if they are read-write Use a reserved bit to remember if the page is really read-only On a read fault n n If it is not really read-only, then record a modify in the data structure and change it to read-write Restart the instruction 18

Implementing LRU without Reference Bit n Some machines have no reference bit n n

Implementing LRU without Reference Bit n Some machines have no reference bit n n VAX, for example Use the valid bit or access bit to simulate n n n Invalidate all valid bits (even they are valid) Use a reserved bit to remember if a page is really valid On a page fault n n n If it is a valid reference, set the valid bit and place the page in the LRU list If it is a invalid reference, do the page replacement Restart the faulting instruction 19

Demand Paging Pure demand paging relies only on faults to bring in pages n

Demand Paging Pure demand paging relies only on faults to bring in pages n Problems? n n Possibly lots of faults at startup Ignores spatial locality Remedies n n Loading groups of pages per fault Prefetching/preloading 20

Speed and Sluggishness n n n Slow is >. 1 seconds (100 ms) Speedy

Speed and Sluggishness n n n Slow is >. 1 seconds (100 ms) Speedy is <<. 1 seconds Monitors tend to be 60+ Hz = <16. 7 ms between screen paints n Disks have seek + rotational delay n n n Seek is somewhere between 7 -16 ms At 7200 rpm, one rotation = 1/120 sec = 8 ms. Half -rotation is 4 ms Conclusion? One disk access OK, six are bad 21

Disk Address Use physical memory as a cache for disk n Where to find

Disk Address Use physical memory as a cache for disk n Where to find a page on a page fault? n n Virtual address space invalid Physical memory PPage# field is a disk address 22

Imagine a Global LRU Global – across all processes n Idea – when a

Imagine a Global LRU Global – across all processes n Idea – when a page is needed, pick the oldest page in the system n Problems? Process mixes? n n Interactive processes Active large-memory sweep processes Mitigating damage? 23

Amdahl’s Law Gene Amdahl (IBM, then Amdahl) n Noticed the bottlenecks to speedup n

Amdahl’s Law Gene Amdahl (IBM, then Amdahl) n Noticed the bottlenecks to speedup n Assume speedup affects one component n New time = (1 -not affected) + affected/speedup n In other words, diminishing returns n 24

NT x 86 Virtual Address Space Layouts 0000 Application code Globals Per-thread stacks DLL

NT x 86 Virtual Address Space Layouts 0000 Application code Globals Per-thread stacks DLL code 3 -GB user space 7 FFFFFFF 80000000 Kernel & exec HAL Boot drivers C 0000000 Process page tables Hyperspace C 0800000 System cache Paged pool Nonpaged pool FFFF BFFFFFFF C 0000000 1 -GB system space FFFF 25

Virtual Address Space in Win 95 and Win 98 0000 User accessible 7 FFFFFFF

Virtual Address Space in Win 95 and Win 98 0000 User accessible 7 FFFFFFF 80000000 Shared, process-writable (DLLs, shared memory, Win 16 applications) C 0000000 Win 95 and Win 98 Operating system (Ring 0 components) FFFF Unique per process (per application), user mode Systemwide kernel mode 26

Details with VM Management n Create a process’s virtual address space n n Allocate

Details with VM Management n Create a process’s virtual address space n n Allocate page table entries (reserve in NT) Allocate backing store space (commit in NT) Put related info into PCB Destroy a virtual address space n n n Deallocate all disk pages (decommit in NT) Deallocate all page table entries (release in NT) Deallocate all page frames 27

Page States (NT) n n n n Active: Part of a working set and

Page States (NT) n n n n Active: Part of a working set and a PTE points to it Transition: I/O in progress (not in any working sets) Standby: Was in a working set, but removed. A PTE points to it, not modified and invalid. Modified: Was in a working set, but removed. A PTE points to it, modified and invalid. Modified no write: Same as modified but no write back Free: Free with non-zero content Zeroed: Free with zero content Bad: hardware errors 28

Dynamics in NT VM Demand zero fault Page in or allocation Standby list Process

Dynamics in NT VM Demand zero fault Page in or allocation Standby list Process “Soft” working faults set Working set replacement Modified writer Free list Zero thread Zero list Bad list Modified list 29

Shared Memory n How to destroy a virtual address space? n n n How

Shared Memory n How to destroy a virtual address space? n n n How to swap out/in? n n n Link all PTEs Reference count n . . . Page table Process 1 Link all PTEs Operation on all entries How to pin/unpin? n w Link all PTEs Reference count . . . w. . . Physical pages Page table Process 2 30

Copy-On-Write n n n Child’s virtual address space uses the same page mapping as

Copy-On-Write n n n Child’s virtual address space uses the same page mapping as parent’s Make all pages read-only Make child process ready On a read, nothing happens On a write, generates an access fault n n n map to a new page frame copy the page over restart the instruction . . . r r Page table Parent process . . . r r . . . Physical pages Page table Child process 31

Issues of Copy-On-Write n How to destroy an address space n n How to

Issues of Copy-On-Write n How to destroy an address space n n How to swap in/out? n n Same as shared memory case? Same as shared memory How to pin/unpin n Same as shared memory 32