Are You Insured Against Your Noisy Neighbor A
Are You Insured Against Your Noisy Neighbor - A VSPERF Use Case Sunku. Ranganath@Intel. com Sridhar. Rao@Spirent. com Shreya. Pandita@Spirent. com
Agenda • • • Intro to VSPERF Intro to Intel RDT & Spirent Cloud Stress Demo: Noisy Neighbor impact with VSPERF Intro to RMD Demo: Mitigating Noisy Neighbor impact with RMD Call to Action
Intro to VSPERF • Define, implement and execute a test suite to characterize the performance of a virtual switch in the NFVi • Based on industry standards • Ability to assign and scale CPUs for VNFs • Supports multiple traffic generators and virtual switches with various VNF deployment scenarios
Common Contention in Cloud Deployments • Minimizing Total Cost of Ownership (TCO) often leads to oversubscription • Quality of Service (Qo. S) requirements – Service Level Agreements (SLAs) Metrics: Service Availability, Throughput, Latency, Scaling. • Cloud vs. Network Function Virtualization Deployments – Optimizing CPU resource utilization often leads to Shared Resource contention • Multi-Tenants & Automated workload placement – Lack of control of cache by orchestration layer
Intel® Resource Director Technology (Intel® RDT) Cache Allocation Technology (CAT) Cache Monitoring Technology (CMT) CORE CORE APP APP APP Last. Level Cache DRAM § Identify misbehaving applications and reschedule according to priority § Cache Occupancy reported on a per Resource Monitoring ID (RMID) basis—Advanced Telemetry DRAM § Last-Level Cache partitioning mechanism enabling separation and prioritization of apps or VMs § Misbehaving threads can be isolated to increase determinism
Key Concepts: Class of Service (CLOS) § Threads/Apps/VMs grouped into Classes of Service (CLOS) for resource allocation § Resource usage of any thread, app, VM, or a combination controlled with a CLOS § Specify the CLOS for a thread via the per-core IA 32_PQR_ASSOC (“PQR”) MSR § Configure resource guidelines per CLOS § Associate threads into CLOS § Hardware manages resource allocation Default Bitmask LLC is all shared Overlapped Bitmask LLC is partially shared. Low priority Workload will be placed in COS with shared resources Isolated Bitmask LLC is allocated separately to individual COS.
Noisy Neighbor Impact & VSPERF Pod 12 – Node 4 • • NUMA Node 0 VSPERF integration with Collectd provides insight into NFVi data plane resource utilization VSPERF automates the deployment of Phy-VM-Phy setup 2 Dedicated cores 4 Dedicated cores VNF 1 (Testpmd L 2 FWD) Virtio port 0 Traffic Generator 2000 Flows 4 Dedicated cores Cloud Stress Noisy Neighbor Virtio port 1 Open v. Switch bridge NUMA Node 0 DPDK - PMDs 3 Dedicated cores • Cloud Stress as a Noisy Neighbor port Linux Kernel DPD K Intel Xeon Platform Internet Port: Onboard Intel Gb. E NIC Si Figure: A Phy-VMPhy deployment Tenant port 1 Tenant port 2
Intro to Spirent Cloud. Stress ¡ Web-based infrastructure validation application Cloud. Stress ¡ Performance and capacity planning for Compute, Memory, Storage and Network I/O ¡ Dynamic workloads to validate NFV/Cloud infrastructure
Intro to Spirent Cloud. Stress Virtual Firewall NFVi Compute Network Storage
Creating Virtual Machine Profiles Spirent Cloud. Stress NFVi Compute Network Storage
Creating Virtual Machine Profiles Spirent Cloud. Stress NFVi Under Test Compute Network Storage
Capacity Planning NFVi Under Test Compute Network Storage
Cloud Stress as Noisy Neighbor Assess impact of resource contention on VNF and/or NFV service chains. VNF Performance Noisy Neighbor v. Router v. FW v. CPE VNF NFVi Under Test Compute Network Storage
Demo : Impact of Noisy Neighbor on VNF Under Test
Planning For Resources • Remote analysis of resource utilization and granular resource control not optimal for latency sensitive workloads • Planning for your Cache: – LLC Profiling – LLC considerations – Class Of Service construction
Class Of Service Construction Total LLC • Considerations: – Capacity of Cache – Would you require DDIO? – Isolated vs. Overlapping cache COS • 1 Non DDIO Packet path Figure: Traffic flow from NIC to VMs Crucial to have local agent on the host to control & enforce COS associations for latency sensitive workloads
Enabling Options • Platform Quality of Service (pqos) tool – User space tool requiring access to Intel MSR – Associates LLC per Core id basis – https: //github. com/intel-cmtcat • Resctrl file system – Extension of kernfs – Associates using pid per thread basis – Kernel 4. 10+ • Resource Management Daemon – Newly open sourced Figure: Kernel resctrl fs
Resource Management Daemon What is RMD • A Linux daemon that runs on individual hosts, with pluggable interfaces to interact with orchestration, monitoring and enforcement layers • Communicates across control and data plane using REST API • Receives resource policy from orchestration layer and enforces it on host • Enforces resource allocation using kernel interfaces like resctrlfs or using libraries like libpqos Hosted at https: //github. com/intel/rmd Why RMD 1. Complex usage (mask) 2. Real time tuning 3. Varying platforms (cache size, bandwidth, numa) 4. Fast shifting workloads (local policy) 5. Uniform interface for RDT Simple API Interface
RMD Architecture Open sourced on Nov 9 th 2017 • Provides the construct of overlapped and isolated COS’es • Help tune the LLC for optimal performance • Simple to use with max_cache and min_cache constructs WIP sections Configuration Policy Details osgroup cache ways reserved for operate system usage infragroup cache ways will be shared with other workloads guarantee allocate cache for workload max_cache == min_cache > 0 besteffort allocate cache for workload max_cache > min_cache > 0 shared allocate cache for workload max_cache == min_cache = 0
Demo : Mitigation of Noisy Neighbor Impact with RMD
Permutations of Test Scenarios • Overlapping COS between: – Virtual switch and VMs – Multiple VMs – OS and virtual switch • • Isolated COS between: – Virtual switch and VMs DDIO considerations: – Exclusive to VMs – Exclusive to OS – Shared across virtual switch & VMs • Forced Contentions – Limited LLC to VM under test Total LLC Hypervisor OS 0, 79, 1323 COS 3 1. DDIO Isolated LLC Hypervisor PHY PMDs vswitchd 1 2 3 COS 0 2. DDIO Isolated OVS LLC Overlapped OVS LLC VM 1 4 VM 2 5 6 COS 1 1 0 1 2 1 1 COS 2 3. DDIO Isolated LLC Overlapped LLC across VMs Figure: Permutations of COS association
In Summary…. • Noisy Neighbor affects are real and here to persist • RMD provides a REST API for control/orchestration/management layer to request LLC for their VMs/Containers/applications. Call To Action… • Enable test cases for VSPERF with various combinations of cache associations • Scale the test scenarios for your projects with RMD and/or Cloud Stress
Questions?
References • Cloud Testing with Synthetic Workload Gen: https: //www. spirent. com/-/media/White. Papers/Broadband/PAB/Cloud_testing_with_synthetic_workload_generators. pdf • Virtual Infrastructure Benchmarking: https: //www. spirent. com/-/media/White. Papers/Broadband/PAB/Key_Considerations_for_Virtual_Infrastructure_Benchmarking_whit epaper. pdf • Intro to Intel RDT: https: //01. org/intel-rdt-linux/blogs/fyu 1/2017/resource-allocationintel%C 2%AE-resource-director-technology • Intro to RMD: https: //github. com/intel/rmd • Deterministic NFV w/ Intel RDT: https: //builders. intel. com/docs/networkbuilders/deterministic_network_functions_virtualizatio n_with_Intel_Resource_Director_Technology. pdf
- Slides: 24