Virtual Knotter Online Virtual Machine Shuffling for Congestion

  • Slides: 35
Download presentation
Virtual. Knotter: Online Virtual Machine Shuffling for Congestion Resolving in Virtualized Datacenter Xitao Wen,

Virtual. Knotter: Online Virtual Machine Shuffling for Congestion Resolving in Virtualized Datacenter Xitao Wen, Kai Chen, Yan Chen, Yongqiang Liu, Yong Xia, Chengchen Hu 1

Datacenter as Infrastructure 2

Datacenter as Infrastructure 2

Congestion in Datacenter Packet loss! 2: 1~10: 1~100: 1 Queuing delay! Degrading Throughput! 3

Congestion in Datacenter Packet loss! 2: 1~10: 1~100: 1 Queuing delay! Degrading Throughput! 3

Congestion in the Wild General Approaches Problem Formulation Main Design Evaluation 4

Congestion in the Wild General Approaches Problem Formulation Main Design Evaluation 4

Spatial Pattern – Hotspot: Hot links account for <10% core links [IMC 10] –

Spatial Pattern – Hotspot: Hot links account for <10% core links [IMC 10] – Spatially unbalanced utilization Receiver • Unbalanced utilization Sender 5

Temporal Pattern • Long congestion event Core Link Index – lasts for 10 s

Temporal Pattern • Long congestion event Core Link Index – lasts for 10 s of minutes – Individual event has clear spatial pattern 6

Traffic Stability • Bursty at a fine granularity – Not predictable at 10 s

Traffic Stability • Bursty at a fine granularity – Not predictable at 10 s or 100 s or milliseconds [IMC 10][SIGCOMM 09] • Predictable at timescale of 10 s of minutes – 40% to 70% pairwise traffic can be expected stable – 90%+ predictable traffic aggregated at core links 7

Congestion in the Wild General Approaches Problem Formulation Main Design Evaluation 8

Congestion in the Wild General Approaches Problem Formulation Main Design Evaluation 8

General Approaches • Network Layer – Increase network bandwidth • Expensive • Requires to

General Approaches • Network Layer – Increase network bandwidth • Expensive • Requires to upgrade entire DC network • Fat-tree, BCube, OSA… – Optimize flow routing • Hedera, Micro. TE • Application Layer – Optimize VM placement • Scalable • Lightweight deployment • Suitable for existing oversubscribed network • Not scalable • Requires hardware support • Depends on rich path diversity 9

Background on Virtualized DC • Virtualization Layer • VM Live Migration Major Cost! –

Background on Virtualized DC • Virtualization Layer • VM Live Migration Major Cost! – Keep continuous service while migrating – 1. 1 x – 1. 4 x VM memory transfer VM VM VM Server DC Network 10

Optimize VM Placement • Offload traffic from congested link active VM idle VM 11

Optimize VM Placement • Offload traffic from congested link active VM idle VM 11

Congestion in the Wild General Approaches Problem Formulation Main Design Evaluation 12

Congestion in the Wild General Approaches Problem Formulation Main Design Evaluation 12

Design Goal • Mitigate congestion Objective – Maximum link utilization (MLU) • Controllable migration

Design Goal • Mitigate congestion Objective – Maximum link utilization (MLU) • Controllable migration traffic (i. e. moving VM) Constraint – Less than reduced traffic • Reasonable runtime overhead – Far less than target timescale (10 s of mins) 13

Problem Statement • Input – Topology and routing of physical servers – Traffic matrix

Problem Statement • Input – Topology and routing of physical servers – Traffic matrix among VMs – Current Placement • Variable & Output – Optimized Placement • NP-hardness – Proof: reduced from Quadratic Bottleneck Assignment Problem 14

Related Work • Optimize VM placement – Server consolidation [SOSP’ 07] – Fault tolerance

Related Work • Optimize VM placement – Server consolidation [SOSP’ 07] – Fault tolerance [ICS’ 07] – Network scalability [INFOCOM’ 10] 15

Congestion in the Wild General Approaches Problem Formulation Main Design Evaluation 16

Congestion in the Wild General Approaches Problem Formulation Main Design Evaluation 16

Inspiration Solve each tie gently, by carefully reeving the end out of the tie.

Inspiration Solve each tie gently, by carefully reeving the end out of the tie. Stretch the tie violently, making it loose and less tangled. 17

Two-step Algorithm • Fast and greedy • Search for localizing overall traffic • May

Two-step Algorithm • Fast and greedy • Search for localizing overall traffic • May stuck in local minimum • Fine-grained and randomized • Search for mitigating traffic on the most congested links • Help avoid local minimum 18

Multiway Θ-Kernighan-Lin (KL) • Top-down graph cut improvement • Introduce Θ to limit #

Multiway Θ-Kernighan-Lin (KL) • Top-down graph cut improvement • Introduce Θ to limit # of moves • O(n 2 log(n)) 19

Multiway Θ-Kernighan-Lin (KL) • Top-down graph cut improvement • Introduce Θ to limit #

Multiway Θ-Kernighan-Lin (KL) • Top-down graph cut improvement • Introduce Θ to limit # of moves • O(n 2 log(n)) 20

Multiway Θ-Kernighan-Lin (KL) • Top-down graph cut improvement • Introduce Θ to limit #

Multiway Θ-Kernighan-Lin (KL) • Top-down graph cut improvement • Introduce Θ to limit # of moves • O(n 2 log(n)) 21

Simulated Annealing Searching (SA) MLU=. 53 MLU=. 60 • Randomized global searching • Terminate

Simulated Annealing Searching (SA) MLU=. 53 MLU=. 60 • Randomized global searching • Terminate when obtains satisfied solution, or predefined max depth is reached 22

Congestion in the Wild General Approaches Problem Formulation Main Design Evaluation 23

Congestion in the Wild General Approaches Problem Formulation Main Design Evaluation 23

Methodology • Baseline Algorithm – Clustering-based algorithm – Pro: best-known static optimality – Con:

Methodology • Baseline Algorithm – Clustering-based algorithm – Pro: best-known static optimality – Con: high runtime and migration overhead • Metrics – MLU reduction without migration overhead – Overhead • Migration traffic • Runtime overhead – Simulation results 24

MLU Reduction without Overhead Virtual. Knotter demonstrates similar static performance as that of Clustering.

MLU Reduction without Overhead Virtual. Knotter demonstrates similar static performance as that of Clustering. 25

Migration Traffic Virtual. Knotter shows significantly less migration traffic than that of Clustering. 26

Migration Traffic Virtual. Knotter shows significantly less migration traffic than that of Clustering. 26

Runtime Overhead Virtual. Knotter demonstrates reasonable runtime overhead. 27

Runtime Overhead Virtual. Knotter demonstrates reasonable runtime overhead. 27

Simulation Results 53% less congestion Altogether, Virtual. Knotter obtains significant gain on congestion resolving.

Simulation Results 53% less congestion Altogether, Virtual. Knotter obtains significant gain on congestion resolving. 28

Conclusions • Collaborative VM migration can substantially resolve long-term congestion in DC • Trade-off

Conclusions • Collaborative VM migration can substantially resolve long-term congestion in DC • Trade-off between optimality and migration traffic is essential to harvest the benefit DC networking projects of Northwestern LIST: http: //list. cs. northwestern. edu/dcn 29

Thank you! 30

Thank you! 30

Backup 31

Backup 31

General Approaches Cos Hardware Scalabilit Other Support Dependency t y Increase High Bandwidth Optimize

General Approaches Cos Hardware Scalabilit Other Support Dependency t y Increase High Bandwidth Optimize Routing Optimize VM Placement Low Yes No Varies Low Rich path diversity High VM deploymen t 32

Problem Statement • Objective – Minimize Maximum Link Utilization (MLU) – “Cool down the

Problem Statement • Objective – Minimize Maximum Link Utilization (MLU) – “Cool down the hottest spot” • Constraints – Migration traffic – Server hardware capacity – Inseparable VM • NP-hardness – Proof: reduced from Quadratic Bottleneck Assignment Problem 33

Observation Summary • Unbalanced jam (spatial) • Long-term congestion (temporal) • Predictable at 10

Observation Summary • Unbalanced jam (spatial) • Long-term congestion (temporal) • Predictable at 10 s of minutes scale (stability) 34

Two-step Algorithm Multiway Θ-Kernighan-Lin Algorithm (KL) • Fast search for approximation Simulated Annealing Searching

Two-step Algorithm Multiway Θ-Kernighan-Lin Algorithm (KL) • Fast search for approximation Simulated Annealing Searching (SA) • Fine search for better solution 35