v CAT Dynamic Cache Management using CAT Virtualization

  • Slides: 37
Download presentation
v. CAT: Dynamic Cache Management using CAT Virtualization Meng Xu Linh Thi Xuan Phan

v. CAT: Dynamic Cache Management using CAT Virtualization Meng Xu Linh Thi Xuan Phan Hyon-Young Choi Insup Lee Department of Computer and Information Science University of Pennsylvania

Trend: Multicore & Virtualization • Cyber physical systems are becoming increasingly complex – Require

Trend: Multicore & Virtualization • Cyber physical systems are becoming increasingly complex – Require high performance and strong isolation • Virtualization on multicore help handle such complexity Increase performance and reduce cost Challenge: Harder to achieve timing isolation Collision avoidance Adaptive cruise control Pedestrian detection Infotainment VM 1 VM 2 VM 3 VM 4 Hypervisor 2

Problem: Shared cache interference • A task uses the cache to reduce its execution

Problem: Shared cache interference • A task uses the cache to reduce its execution time • Concurrent tasks may access the same cache area Extra cache misses Increased WCET Intra-VM interference Inter-VM cache interference VM 1 1 2 VM 2 3 4 Tasks: 1 2 3 4 Hypervisor P 1 P 2 Collision P 3 P 4 Cache 3

Existing approach: Static management • Statically assign non-overlapped cache areas to tasks (VMs) •

Existing approach: Static management • Statically assign non-overlapped cache areas to tasks (VMs) • Pros: Simple to implement • Cons: Low cache resource utilization – Unused cache area of one task (VM) cannot be reused by another • Cons: Not always feasible – e. g. , when the whole task set does not fit into the cache VM 1 1 2 VM 2 3 4 Hypervisor P 1 P 2 P 3 Tasks: 1 2 3 4 P 4 4

Our approach: Dynamic management • Dynamically assign disjoint cache areas to tasks (VMs) •

Our approach: Dynamic management • Dynamically assign disjoint cache areas to tasks (VMs) • Pros: Enable cache reuse Better utilization of the cache Running tasks (VMs) can have larger cache areas, and thus smaller WCETs VM 1 1 2 VM 2 3 4 Hypervisor P 1 P 2 P 3 Tasks: 1 2 3 4 P 4 5

Our approach: Dynamic management • Challenge: How to achieve the efficient dynamic cache management

Our approach: Dynamic management • Challenge: How to achieve the efficient dynamic cache management while guaranteeing isolation? – Efficiency: The dynamic management should incur small overhead • Solution: Hardware-based – Increasingly many CPUs support the cache partitioning – Benefit: Cache reconfiguration can be done very efficiently Example: Intel processors that support cache partitioning Processor family Intel(R) Xeon(R) processor E 5 v 3 Intel(R) Xeon(R) processor D Number of COTS processors 6 out of 48 15 out of 15 Intel(R) Xeon(R) processor E 3 v 4 5 out of 5 Intel(R) Xeon(R) processor E 5 v 4 117 out of 117 Source: https: //github. com/01 org/intel-cmt-cat and http: //www. intel. com/ 6

Contribution: v. CAT • v. CAT: Dynamic cache management by virtualizing CAT – First

Contribution: v. CAT • v. CAT: Dynamic cache management by virtualizing CAT – First work that achieves dynamic cache management for tasks in virtualization systems on commodity multicore hardware • Achieve strong shared cache isolation for tasks and VMs • Support the dynamic cache management for tasks and VMs – OS in VM can dynamically allocate cache partitions for its tasks – Hypervisor can dynamically reconfigure cache partitions for VMs • Support cache sharing among best-effort VMs and tasks 7

Outline • Introduction • Background: Intel CAT • Design & Implementation • Evaluation 8

Outline • Introduction • Background: Intel CAT • Design & Implementation • Evaluation 8

Intel Cache Allocation Technology (CAT) • Divide the shared cache into α partitions (α

Intel Cache Allocation Technology (CAT) • Divide the shared cache into α partitions (α = 20) – Similar to way-based cache partitioning • Provide two types of model-specific registers – Each core has a PQR register – K Class of Service (COS) registers shared by all cores (K = 4) 63 63 COS register ID PQR Reserved 31 9 20 Cache Bit Mask 0 0 COS Shared cache 9

Intel Cache Allocation Technology (CAT) • Divide the shared cache into α partitions (α

Intel Cache Allocation Technology (CAT) • Divide the shared cache into α partitions (α = 20) – Similar to way-based cache partitioning • Provide two types of model-specific registers – Each core has a PQR register – K Class of Service (COS) registers shared by all cores (K = 4) • Configure cache partitions for a core – Step 1: Set the cache bit mask of the COS – Step 2: Link the core with a COS by setting PQR 63 63 COS register 1 ID PQR Reserved 31 9 20 Cache 0 x 0000 F Bit Mask 0 0 COS Shared cache 10

Intel CAT: Software support • Xen hypervisor supports Intel CAT – System operators can

Intel CAT: Software support • Xen hypervisor supports Intel CAT – System operators can allocate cache partitions for VMs only • Pros: Mitigate the interference among VMs • Cons: Do not provide strong isolation among VMs • Cons: Do not allow a VM to manage partitions for its tasks – Tasks in the same VM can still interfere each other • Cons: Only support a limited number of VMs with different cache-partition settings – e. g. , the number of VMs with different cache-partition settings supported by Xen is ≤ 4 on our machine (Intel Xeon 2618 L v 3 processor). 11

Outline • Introduction • Background: Intel CAT • Design & Implementation • Evaluation 12

Outline • Introduction • Background: Intel CAT • Design & Implementation • Evaluation 12

Goals • Dynamically control cache allocations for tasks and VMs – Each VM should

Goals • Dynamically control cache allocations for tasks and VMs – Each VM should control the cache allocation for its own tasks – The hypervisor should control the cache allocation for the VMs • Preserve the virtualization abstraction layer – Physical resources should not be exposed to VMs • Guarantee cache isolation among tasks and VMs – Tasks should not interfere with each other after the reconfiguration 13

Dynamic cache allocation for tasks • To modify the cache configuration of a task,

Dynamic cache allocation for tasks • To modify the cache configuration of a task, VM needs to modify the cache control registers – BUT, cache control registers are only available to the hypervisor • One possible approach: Expose the registers to VMs VM Modify COS Hypervisor Core P 2 0 x. F COS register Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14

Dynamic cache allocation for tasks • To modify the cache configuration of a task,

Dynamic cache allocation for tasks • To modify the cache configuration of a task, VM needs to modify the cache control registers – BUT, cache control registers are only available to the hypervisor • One possible approach: Expose the registers to VMs • Problem: Potential cache interference among VMs – e. g. , a VM may overwrite the hypervisor’s allocation decision VM Modify COS Hypervisor Core P 2 0 x. F 00 COS register Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Dynamic cache allocation for tasks • To modify the cache configuration of a task,

Dynamic cache allocation for tasks • To modify the cache configuration of a task, VM needs to modify the cache control registers – BUT, cache control registers are only available to the hypervisor • One possible approach: Expose the registers to VMs • Problem: Potential cache interference among VMs – e. g. , a VM may overwrite the hypervisor’s allocation decision VM Modify COS Validate the operation Hypervisor Core P 2 0 x. F 00 COS register Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 16

Dynamic cache allocation for tasks • To modify the cache configuration of a task,

Dynamic cache allocation for tasks • To modify the cache configuration of a task, VM needs to modify the cache control registers – BUT, cache control registers are only available to the hypervisor • One possible approach: Expose the registers to VMs • Problem: Potential cache interference among VMs – e. g. , a VM may overwrite the hypervisor’s allocation decision • Problem: Hypervisor needs to notify VMs for any changes VM Modify COS Validate the operation Hypervisor Core P 2 0 x. F COS register Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 17

v. CAT: Key insight • Virtualize cache partitions and expose virtual caches to VMs

v. CAT: Key insight • Virtualize cache partitions and expose virtual caches to VMs – Hypervisor assigns virtual and physical cache partitions to VMs – VM controls the allocation of its assigned virtual partitions to tasks – Hypervisor translates VM’s operations on virtual partitions to operations on the physical partitions VM VM operates on its virtual cache Virtual cache Translate the operation Hypervisor P 2 Core 0 x. F 0 COS register Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 18

Challenge 1: No control for cache hit requests • A task’s contents stay in

Challenge 1: No control for cache hit requests • A task’s contents stay in cache until they are evicted • Problem: A task can access its content in its previous partitions via cache hits interfere with another task • Not explicitly documented in Intel’s SDM – We confirmed this limitation with experiments (available in the paper) Tasks Core Physical cache hit 1 2 3 4 Collision 5 6 7 8 9 10 11 12 13 14 19

Solution: Cache flushing • Task’s content in the previous partitions is no longer valid

Solution: Cache flushing • Task’s content in the previous partitions is no longer valid • Approach 1: Flush for each memory address of the task – Pros: Not affect the other tasks’ cache content – Cons: Slow when a task’s working set size is large (> 8. 46 MB) • Approach 2: Flush the entire cache – Pros: Efficient when a task’s working set size is large (> 8. 46 MB) – Cons: Flush the other tasks’ cache content as well • v. CAT provides both approaches to system operators – Discussion of the tradeoffs and flushing heuristics are in the paper Tasks Core Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20

Challenge 2: Contiguous allocation constraint • Unallocated partitions may NOT be contiguous Fragmentation of

Challenge 2: Contiguous allocation constraint • Unallocated partitions may NOT be contiguous Fragmentation of cache partitions in dynamic allocation Low cache resource utilization VM 1 VM 3 Virtual cache Physical cache VM 2 Invalid! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 21

Solution: Partition defragmentation • Rearrange the partitions to form contiguous partitions – Hypervisor rearranges

Solution: Partition defragmentation • Rearrange the partitions to form contiguous partitions – Hypervisor rearranges physical cache partitions for VMs – VM rearranges virtual cache partitions for tasks VM 1 VM 3 VM 2 Virtual cache Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 22

v. CAT: Design summary • Introduce virtual cache partitions – Enable the VM to

v. CAT: Design summary • Introduce virtual cache partitions – Enable the VM to control the cache allocation for its tasks without breaking the virtualization abstraction • Flush the cache when the cache partitions of tasks (VMs) are changed – Guarantee cache isolation among tasks and VMs in dynamic cache management • Defragment non-contiguous cache partitions – Enables better cache utilization • Refer to the paper for technical details and other design considerations – e. g. , how to allocate and de-allocate partitions for tasks and VMs – e. g. , how to support an arbitrary number of tasks and VMs with different cache-partition settings 23

Implementation • Hardware: Intel Xeon 2618 L v 3 processor – Design works for

Implementation • Hardware: Intel Xeon 2618 L v 3 processor – Design works for any processors that support both virtualization and hardware-based cache partitioning • Implementation based on Xen 4. 8 and LITMUSRT 2015. 1 – LITMUSRT: Linux Testbed for Multiprocessor Scheduling in Real. Time Systems • 5 K Line of Code (Lo. C) in total – Hypervisor (Xen): 3264 Lo. C – VM (LITMUSRT): 2086 Lo. C • Flexible to add new cache management policies 24

Outline • Introduction • Background: Intel CAT • Design • Evaluation 25

Outline • Introduction • Background: Intel CAT • Design • Evaluation 25

v. CAT Evaluation: Goals • How much overhead is introduced by v. CAT? •

v. CAT Evaluation: Goals • How much overhead is introduced by v. CAT? • How much WCET reduction is achieved through cache isolation? • How much real-time performance improvement v. CAT enables? – Static management vs. No management – Dynamic management vs. Static management 26

v. CAT Evaluation: Goals • How much overhead is introduced by v. CAT? •

v. CAT Evaluation: Goals • How much overhead is introduced by v. CAT? • How much WCET reduction is achieved through cache isolation? • How much real-time performance improvement v. CAT enables? – Static management vs. No management – Dynamic management vs. Static management The rest of the evaluation is available in the paper 27

v. CAT Evaluation: Goals • How much overhead is introduced by v. CAT? •

v. CAT Evaluation: Goals • How much overhead is introduced by v. CAT? • How much WCET reduction is achieved through cache isolation? • How much real-time performance improvement v. CAT enables? – Static management vs. No management – Dynamic management vs. Static management 28

v. CAT run-time overhead • Static cache management – Overhead occurs only when a

v. CAT run-time overhead • Static cache management – Overhead occurs only when a task/VM is created – Negligible overhead: ≤ 1. 12 us • Dynamic cache management – Overhead occurs whenever the partitions of a task/VM are changed – Reasonably small overhead: ≤ 27. 1 ms – Value depends on the workload’s working set size (WSS) Overhead = min{3. 23 ms/MB × WSS, 27. 1 ms} • More details can be found in the paper – Computation of the overhead value based on the WSS – Experiments that show the factors that contribute to the overhead 29

v. CAT Evaluation: Goals • How much overhead is introduced by v. CAT? •

v. CAT Evaluation: Goals • How much overhead is introduced by v. CAT? • How much WCET reduction is achieved through cache isolation? • How much real-time performance improvement v. CAT enables? – Static management vs. No management – Dynamic management vs. Static management 30

Static management: Evaluation setup • PARSEC benchmarks – Convert to LITMUSRT compatible real-time tasks

Static management: Evaluation setup • PARSEC benchmarks – Convert to LITMUSRT compatible real-time tasks • Randomly generate real-time parameters for the benchmarks to generate real-time tasks Benchmark VM PARSEC benchmarks … VM Pollute VM Cache-intensive task VP 1 VP 2 P 1 P 2 VP 3 VP 4 Pin to core Hypervisor Core P 3 P 4 Cache 31

Static management vs. No management Fraction of schedulable task sets Static management improves system

Static management vs. No management Fraction of schedulable task sets Static management improves system utilization significantly Improve system utilization by 1. 0 / 0. 3 = 3. 3 x VCPU utilization No management Static management Real-time performance of streamcluster benchmark 32

Static management vs. No management The more cache sensitive the workload is, the more

Static management vs. No management The more cache sensitive the workload is, the more performance benefit is achieved 33

v. CAT Evaluation: Goals • How much overhead is introduced by v. CAT? •

v. CAT Evaluation: Goals • How much overhead is introduced by v. CAT? • How much WCET reduction is achieved through cache isolation? • How much real-time performance improvement v. CAT enables? – Static management vs. No management – Dynamic management vs. Static management 34

Dynamic management: Evaluation setup • Create the workloads that have dynamic cache demand •

Dynamic management: Evaluation setup • Create the workloads that have dynamic cache demand • Dual-mode tasks: Switch from mode 1 to mode 2 after 1 min – Type 1: Task increases its utilization by decreasing its period – Type 2: Task decreases its utilization by increasing its period Benchmark VM Pollute VM Type 1 dual-mode task Type 2 dual-mode task … … VM VP 1 VP 2 P 1 P 2 VP 3 VP 4 Hypervisor Core P 3 Cache-intensive task Pin to core P 4 Cache 35

Dynamic management vs. Static management Fraction of schedulable task sets Dynamic outperforms static significantly

Dynamic management vs. Static management Fraction of schedulable task sets Dynamic outperforms static significantly Improve system utilization by 0. 6/0. 2 = 3 x VCPU utilization Static management Dynamic management 36

Conclusion • v. CAT: A dynamic cache management framework for virtualization systems using CAT

Conclusion • v. CAT: A dynamic cache management framework for virtualization systems using CAT virtualization – Provide strong isolations among tasks and VMs – Support both static and dynamic cache allocations for both real-time tasks and best-effort tasks – Evaluation shows that dynamic management substantially improves schedulability compared to static management • Future work – Develop more sophisticated cache resource allocation policies for tasks and VMs in virtualization systems – Apply v. CAT to real systems, e. g. , automotive systems and cloud computing 37