Optimizing Cost and Performance for Multihoming Lili Qiu

  • Slides: 27
Download presentation
Optimizing Cost and Performance for Multihoming Lili Qiu Microsoft Research liliq@microsoft. com Joint Work

Optimizing Cost and Performance for Multihoming Lili Qiu Microsoft Research liliq@microsoft. com Joint Work with D. K. Goldenberg, H. Xie, Y. R. Yang, Yale University Y. Zhang, AT&T Labs – Research ACM SIGCOMM 2004

Multihoming & Smart Routing Multihoming ISP 1 User ISP 2 – A popular way

Multihoming & Smart Routing Multihoming ISP 1 User ISP 2 – A popular way of connecting to Internet Smart routing – Intelligently distribute traffic among multiple external links ISP K 2

Potential Benefits • Improve performance – Potential improvement: 25+% [Akella 03] – Similar to

Potential Benefits • Improve performance – Potential improvement: 25+% [Akella 03] – Similar to overlay routing [Akella 04] • Improve reliability – Two orders of magnitude improvement in fault tolerance of end-to-end paths [Akella 04] • Reduce cost Q: How to realize the potential benefits? 3

Our Goals • Goal – Design effective smart routing algorithms to realize the potential

Our Goals • Goal – Design effective smart routing algorithms to realize the potential benefits of multihoming • Questions – How to assign traffic to multiple ISPs to optimize cost? – How to assign traffic to multiple ISPs to optimize both cost and performance? – What are the global effects of smart routing? 4

Related Work Techniques for implementing multihoming – BGP peering, DNS-based, NAT-based (e. g. ,

Related Work Techniques for implementing multihoming – BGP peering, DNS-based, NAT-based (e. g. , [RFC 2260, Cisco, GCLC 04, Radware, F 5]) – Complementary to our work Performance evaluation [Akella 03, Akella 04] – Quantify the potential benefits of multihoming – Unaddressed challenge: how to achieve this in practice Smart routing – Commercial products (e. g. , [Route. Science, Internap, Proficient, …]) – Technical details are unavailable Hash-based load balancing [Cao 01, Guo 04] – Optimizes neither performance nor cost 5

Network Model • Network performance metric – Latency (also an indicator for reliability) –

Network Model • Network performance metric – Latency (also an indicator for reliability) – Extend to alternative metrics • log (1/(1 -loss. Rate)), or latency+w*log(1/(1 -loss. Rate)) • ISP charging models – Cost = C 0 + C(x) – C 0: a fixed subscription cost – C: a piece-wise linear non-decreasing function mapping x to cost – x: charging volume • Total volume based charging • Percentile-based charging (95 -th percentile) 6

Percentile Based Charging Sorted volume 95%*N N Interval Charging volume: traffic in the (95%*N)-th

Percentile Based Charging Sorted volume 95%*N N Interval Charging volume: traffic in the (95%*N)-th sorted interval 7

Why cost optimization? • A simple example: – A user subscribes to 4 ISPs,

Why cost optimization? • A simple example: – A user subscribes to 4 ISPs, whose latency is uniformly distributed – In every interval, the user generates one unit of traffic • To optimize performance – – – ISP 1: 1, 0, 0, 0, … ISP 2: 0, 1, 0, 0, … ISP 3: 0, 0, 1, 0, … ISP 4: 0, 0, 0, 1, … 95 th-percentile = 1 for all 4 ISPs 95 th-percentile = 1 using one ISP • Cost(4 ISPs) = 4 * cost(1 ISP) Optimizing performance alone could result in high cost! 8

Cost Optimization: Problem Specification (2 ISPs) Volume Time 1 2 N 9

Cost Optimization: Problem Specification (2 ISPs) Volume Time 1 2 N 9

Cost Optimization: Problem Specification (2 ISPs) Sorted volume Volume P 1 Sorted volume Time

Cost Optimization: Problem Specification (2 ISPs) Sorted volume Volume P 1 Sorted volume Time P 2 Goal: minimize total cost = C 1(P 1)+C 2(P 2) 10

Issues & Insights • Challenge: traditional optimization techniques do not work with percentiles •

Issues & Insights • Challenge: traditional optimization techniques do not work with percentiles • Key: determine each ISP’s charging volume • Results – Let V 0 denote the sum of all ISPs’ charging volume – Theorem 1: Minimize cost Minimize V 0 – Theorem 2: V 0 ≥ 1 - k=1. . N(1 -qk) quantile of original traffic, where qk is ISP k’s charging percentile 11

Cost Optimization: Problem Specification (2 ISPs) Sorted volume Volume P 1 Sorted volume Time

Cost Optimization: Problem Specification (2 ISPs) Sorted volume Volume P 1 Sorted volume Time P 2 P 1 + P 2 90 -th percentile of original traffic 12

Intuition for 2 -ISP Case • ISP 1 has 5% intervals whose traffic exceeds

Intuition for 2 -ISP Case • ISP 1 has 5% intervals whose traffic exceeds P 1 • ISP 2 has 5% intervals whose traffic exceeds P 2 • The original traffic (ISP 1 + ISP 2 traffic) has 10% intervals whose traffic exceeds P 1+P 2 • P 1+P 2 90 -th percentile of original traffic 13

Sketch of Our Algorithm 1. Determine charging volume for each ISP – – Compute

Sketch of Our Algorithm 1. Determine charging volume for each ISP – – Compute V 0 Find pk that minimize ∑k ck(pk) subject to ∑kpk=V 0 using dynamic programming 2. Assign traffic given charging volumes – – Non-peak assignment: ISP k is assigned pk Peak assignment: • • First let every ISP k serve its charging volume pk Dump all the remaining traffic to an ISP k that has bursted for fewer than (1 -qk)*N intervals 14

Additional Issues • Deal with capacity constraints • Perform integral assignment – Similar to

Additional Issues • Deal with capacity constraints • Perform integral assignment – Similar to bin packing (greedy heuristic) • Make it online – Traffic prediction • Exponential weighted moving average (EWMA) – Accommodate prediction errors • Update V 0 conservatively • Add margins when computing charging volumes 15

Optimizing Cost + Performance • One possible approach: design a metric that is a

Optimizing Cost + Performance • One possible approach: design a metric that is a weighted sum of cost and performance – How to determine relative weights? • Our approach: optimize performance under cost constraints – Use cost optimization to derive upper bounds of traffic that can be assigned to each ISP – Assign traffic to optimize performance subject to the upper bounds 16

Evaluation Methodology • Traffic traces (Oct. 2003 – Jan. 2004) – Abilene traces (Net.

Evaluation Methodology • Traffic traces (Oct. 2003 – Jan. 2004) – Abilene traces (Net. Flow data on Internet 2) • Red. Hat, NASA/GSFC, NOAA Silver Springs Lab, NSF, National Library of Medicine • Univ. of Wisconsin, Univ. of Oregon, UCLA, MIT – MSNBC Web access logs • Realistic cost functions [Feb. 2002 Blind RFP] • Delay traces – NLANR traces: 3 months’ RTT measurements between pairs of 140 universities – Map delay traces to hosts in traffic traces 17

Baseline Algorithms 1. Round robin – – In each interval, assign traffic to a

Baseline Algorithms 1. Round robin – – In each interval, assign traffic to a single ISP Rotate in a round robin fashion 2. Equal split – – In each interval, split traffic equally among ISPs Similar to hash-based load balancing 3. Offline local fractional – Minimize the total cost for each interval independently 4. Dedicated links – Flat rate and independent of usage 18

Cost Comparison for Different Traces Our algorithms significantly out-perform the alternatives. 19

Cost Comparison for Different Traces Our algorithms significantly out-perform the alternatives. 19

Cost Comparison for Varying # Links For all # ISPs, our cost optimization performs

Cost Comparison for Varying # Links For all # ISPs, our cost optimization performs well. 20

Cost + Performance Evaluation Optimizing performance alone often doubles the cost. 21

Cost + Performance Evaluation Optimizing performance alone often doubles the cost. 21

Cost + Performance Evaluation (Cont. ) Our dual metric optimization achieves low cost and

Cost + Performance Evaluation (Cont. ) Our dual metric optimization achieves low cost and latency. 22

Global Effects of Smart Routing • Selfish nature of smart routing – Each user

Global Effects of Smart Routing • Selfish nature of smart routing – Each user optimizes its own cost & performance without considering its impact on other traffic – Need to understand its global effects • Questions – How well does smart routing perform when traffic assignment affects link latency? – How well do different smart routing users co-exist? – How well do smart routing users co-exist with singlehomed users? 23

Evaluation Methodology • Abilene traffic traces • Rocketfuel inter-domain topology – 170 nodes, 600

Evaluation Methodology • Abilene traffic traces • Rocketfuel inter-domain topology – 170 nodes, 600 edges – With propagation delay and OSPF weights – M/M/1 queuing model • Routing – A user selects best performing ISP subject to cost constraints – Inter-domain: shortest AS hop count – Intra-domain: OSPF • Compute traffic equilibria as in [QYZS 03] 24

Global Effects: Summary • Impact of self interference is small • Smart routing users

Global Effects: Summary • Impact of self interference is small • Smart routing users co-exist well with each other • Smart routing users co-exist well with single-homed users 25

Conclusions Contributions – First paper on jointly optimizing cost and performance for multihoming –

Conclusions Contributions – First paper on jointly optimizing cost and performance for multihoming – Propose a series of novel smart routing algorithms that achieve both low cost and good performance – Under traffic equilibria, smart routing improves performance without hurting other traffic Future work – Further evaluation through Internet experiments – Dynamics of interactions among different users – Design better charging models 26

Thank you! 27

Thank you! 27