Agile Paging Exceeding the Best of Nested and
Agile Paging: Exceeding the Best of Nested and Shadow Paging Jayneel Gandhi, Mark D. Hill, Michael M. Swift
Executive Summary Problem: Virtualization valuable but have high overheads with larger workloads (at most 70% slower than native) Existing Choices: 1. Nested Paging: slow page walk but fast page table updates 2. Shadow Paging: fast page walk but slow page table updates Can we get best of both for same address space (or same page walk)? Yes, Agile Paging: use shadow paging and sometime switch to nested paging within the same page walk (at most 4% slower than native) 2
Outline ØMotivation ØAgile Paging ØResults ØSummary 3
Virtualization Overview APP Guest OS VMM Benefits: ü Foundation of our cloud infrastructure ü Provides on-demand virtual instances ü Helps server consolidation Problem: Overheads of virtualizing memory is high At most 70% slower than unvirtualized Hardware 4
Virtualizing Memory APP Guest OS g. VA Guest Virtual Address Guest Page Table g. PA Guest Physical Address VMM Nested Page Table Hardware h. PA Host Physical Address 5
Virtualizing Memory g. VA Two techniques to manage both page tables Guest Page Table 1. Nested Paging -- Hardware g. PA 2. Shadow Paging – Software Nested Page Table Evaluated on two axis: Page Walk Latency h. PA & Page Table Updates 6
Unvirtualized x 86 -64 Translation Virtual Address APP VA APP OS CR 3 Hardware Physical Address PA At most mem accesses = 4 7
1. Nested Paging – Hardware g. VA Longer Page Walk g. VA g. CR 3 Guest Page Table h. PA g. PA Nested Page Table h. PA At most Mem 5 accesses +5 +5 +5 +4 = 24 8
2. Shadow Paging – Software APP Guest OS RO RO g. VA Guest Page Table (Read Only) g. PA Shadow Page Table VMM Nested Page Table Hardware h. PA 9
2. Shadow Paging – Software g. VA Shorter Page Walk Guest Page Table (Read Only) Shadow Page Table s. CR 3 Nested Page Table At most mem accesses = 4 h. PA 10
Page Table Updates 1. Nested Paging 2. Shadow Paging g. VA Guest Page Table (Read Only) g. PA Shadow Page Table Nested Page Table h. PA In-place fast update VMM Trap Slow meditated update 11
Key Observation Fully static. Reality address!!!space Guest Virtual Address Space Shadow Paging preferred Fully Small dynamic fraction address of address space is dynamic Nested Paging preferred 12
Key Observation Guest Page Table g. CR 3 Shadow Nested 13
Outline ØMotivation ØAgile Paging ØResults ØSummary 14
Agile Paging ØStart page walk in shadow mode -- Achieving fast TLB misses ØOptionally switch to nested mode -- Allowing fast in-place updates Two parts of design: 1. Mechanism 2. Policy 15
1. Mechanism g. VA Guest Page Table Shadow Page Table s. CR 3 1 1 g. CR 3 g. PA Read only Nested Page Table Nestedh. PA Page Table 16
1. Mechanism: Example Page Walk g. VA s. CR 3 g. CR 3 h. PA Switch modes @ level 4 of guest page table At most Mem 1 +1 +1 accesses +5 17 =8
2. Policy: Shadow Nested rt Shadow Write to page table (VMM Trap) Nested Shadow (1 Write) W rit et (V o p M ag M Tr e tab ap ) le Sta Subsequent Writes (No VMM Traps) 18
2. Policy: Nested Shadow rt Shadow Write to page table (VMM Trap) M ov Tim e n eo on ut Use dirty bits to track -d irt writes to guest page table y Nested Shadow (1 Write) W rit et (V o p M ag M Tr e tab ap ) le Sta Subsequent Writes (No VMM Traps) 19
Outline ØMotivation ØAgile Paging ØResults ØSummary 20
Methodology • Measure cost on page walks on real hardware • Intel 12 -core Sandy-bridge with 96 GB memory • 64 -entry L 1 TLB + 512 -entry L 2 TLB 4 -way associative for 4 KB pages • 32 -entry L 1 TLB 4 -way associative for 2 MB pages • Prototype VMM and emulate hardware in Linux v 3. 12. 13 • Badger. Trap for online analysis of TLB misses and emulate agile paging • Linear model to predict performance • Workloads • Big-memory workloads, SPEC 2006, Bio. Bench, PARSEC 21
Performance Results B: Unvirtualized N: Nested Paging S: Shadow Paging A: Agile Paging Modeled based on Measured using emulator: Badger. Trap performance counters Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads 22
Performance Results 28% B: Unvirtualized N: Nested Paging S: Shadow Paging A: Agile Paging Nested Paging has high overheads of TLB misses Effect of longer page walk 19% 18% 6% Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads 23
Performance Results B: Unvirtualized N: Nested Paging S: Shadow Paging A: Agile Paging 70% Shadow Paging has high overheads of VMM interventions 28% 11% 30% 18% 19% 6% 6% Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads 24
Performance Results Agile paging consistently performs better than both techniques 28% B: Unvirtualized N: Nested Paging S: Shadow Paging A: Agile Paging 70% 11% 2% 19% 30% 18% 6% 4% Solid bottom bar: Page walk overhead 2% 6% Hashed top bar: VMM overheads 3% 25
Summary Problem: Virtualization valuable but have high overheads with larger workloads (At most 70% slower than native) Existing Choices: 1. Nested Paging: slow page walk but fast page table updates 2. Shadow Paging: fast page walk but slow page table updates Can we get best of both for same address space (or same page walk)? Yes, Agile Paging: use shadow paging and sometime switch to nested paging within the same page walk (At most 4% slower than native) 26
Questions ? 27
Can we get best of both worlds? Dimensions # of memory accesses Page table updates Nested Paging 2 D Shadow Paging 1 D Agile Paging 1 D 24 4 ~4 -5 Fast in-place Slow out of place Fast in-place 28
Short-Lived Processes Issue: The cost of creating shadow page table is high Solution: 1. Start shadow mode after 1 sec for agile paging 2. Give user mode access to run only in nested mode 29
Accessed/Dirty Bits Issue: Shadow mode is slow for setting A/D bits Coherence between shadow and guest page tables causes VMM traps. Solution: Hardware Optimization üIntel sets accessed/dirty bits on both guest and nested page tables üBroadwell supports multiple page table walkers per-core üWe propose to write A/D bits on all three page tables by hardware 30
Context-Switches Issue: Intra-guest context switches with shadow mode are slower Guest OS does not know existence of shadow page table --- VMM trap Solution: Hardware Optimization ü Add a small VMM managed cache of guest CR 3 shadow CR 3 üLooked up by hardware for matching entry on context-switch üIf hits, does not require VMM trap 31
Why does agile paging work? Switch Level Shadow Mem. Acc. 4 graph 500 99. 8% memcached 88. 2% canneal dedup 94. 7% 91. 4% L 4 8 0. 2% 4. 5% L 3 12 7. 3% L 2 16 - L 1 20 - Nested 24 - Avg. 4. 01 4. 76 4. 6% 2. 2% 0. 7% 6. 4% - - - 4. 24 4. 60 Brings average number of memory accesses down to ~(4 -5) from 24 32
Transparent Huge Page (2 MB) 13% 4% B: Unvirtualized N: Nested Paging S: Shadow Paging A: Agile Paging 68% 14% 2% 10% 5% 2% 14% 3% Solid bottom bar: Page walk overhead 6% Hashed top bar: VMM overheads 2% 33
Design Components Hardware • Three page table pointers • Points to each of the page tables • Enhanced page table walker • Interprets switching bit • Bridges the two state machines VMM • Manage three page tables • Incremental from shadow paging • Policies for changing modes • Encapsulate policies in VMM 34
- Slides: 34