Network Tomography and Internet Traffic Matrices Matthew Roughan

  • Slides: 40
Download presentation
Network Tomography and Internet Traffic Matrices Matthew Roughan School of Mathematical Sciences University of

Network Tomography and Internet Traffic Matrices Matthew Roughan School of Mathematical Sciences University of Adelaide <matthew. roughan@adelaide. edu. au> University of Adelaide 1

Credits v David Donoho – Stanford v Nick Duffield – AT&T Labs-Research v Albert

Credits v David Donoho – Stanford v Nick Duffield – AT&T Labs-Research v Albert Greenberg – AT&T Labs-Research v Carsten Lund – AT&T Labs-Research v Quynh Nguyen – AT&T Labs v Yin Zhang – AT&T Labs-Research University of Adelaide 2

Problem Have link traffic measurements Want to know demands from source to destination B

Problem Have link traffic measurements Want to know demands from source to destination B C A University of Adelaide 3

Example App: reliability analysis Under a link failure, routes change want to predict new

Example App: reliability analysis Under a link failure, routes change want to predict new link loads B C A University of Adelaide 4

Network Engineering v What you want to do a) Reliability analysis b)Traffic engineering c)

Network Engineering v What you want to do a) Reliability analysis b)Traffic engineering c) Capacity planning v What do you need to know ü Network and routing ü Prediction and optimization techniques ? Traffic matrix University of Adelaide 5

Outline v Part I: What do we have to work with – data sources

Outline v Part I: What do we have to work with – data sources u SNMP traffic data u Netflow, packet traces u Topology, routing and configuration v Part II: Algorithms u Gravity models u Tomography u Combination and information theory v Part III: Applications u Network Reliability analysis u Capacity planning u Routing optimization (and traffic engineering in general) University of Adelaide 6

Part I: Data Sources University of Adelaide 7

Part I: Data Sources University of Adelaide 7

Traffic Data University of Adelaide 8

Traffic Data University of Adelaide 8

Data Availability – packet traces Packet traces limited availability – like a high zoom

Data Availability – packet traces Packet traces limited availability – like a high zoom snap shot • special equipment needed (O&M expensive even if box is cheap) • lower speed interfaces (only recently OC 192) • huge amount of data generated University of Adelaide 9

Data Availability – flow level data Flow level data not available everywhere – like

Data Availability – flow level data Flow level data not available everywhere – like a home movie of the network • historically poor vendor support (from some vendors) • large volume of data (1: 100 compared to traffic) • feature interaction/performance impact University of Adelaide 10

Data Availability – SNMP traffic data – like a time lapse panorama • MIB

Data Availability – SNMP traffic data – like a time lapse panorama • MIB II (including If. In. Octets/If. Out. Octets) is available almost everywhere • manageable volume of data (but poor quality) • no significant impact on router performance University of Adelaide 12

Part II: Algorithms University of Adelaide 15

Part II: Algorithms University of Adelaide 15

The problem 1 route 3 router route 2 3 2 Want to compute the

The problem 1 route 3 router route 2 3 2 Want to compute the traffic xj along route j from measurements on the links, yi University of Adelaide 16

The problem 1 route 3 router route 2 3 2 Want to compute the

The problem 1 route 3 router route 2 3 2 Want to compute the traffic xj along route j from measurements on the links, yi y = Ax University of Adelaide 17

Underconstrained linear inverse problem y = Ax Link measurements Traffic matrix Routing matrix Many

Underconstrained linear inverse problem y = Ax Link measurements Traffic matrix Routing matrix Many more unknowns than measurements University of Adelaide 18

Naive approach University of Adelaide 19

Naive approach University of Adelaide 19

Gravity Model v Assume traffic between sites is proportional to traffic at each site

Gravity Model v Assume traffic between sites is proportional to traffic at each site x 1 y 1 y 2 x 2 y 2 y 3 x 3 y 1 y 3 v Assumes there is no systematic difference between traffic in LA and NY u Only the total volume matters u Could include a distance term, but locality of information is not as important in the Internet as in other networks University of Adelaide 20

Simple gravity model University of Adelaide 21

Simple gravity model University of Adelaide 21

Generalized gravity model v Internet routing is asymmetric v A provider can control exit

Generalized gravity model v Internet routing is asymmetric v A provider can control exit points for traffic going to peer networks peer links access links University of Adelaide 22

Generalized gravity model v Internet routing is asymmetric v A provider can control exit

Generalized gravity model v Internet routing is asymmetric v A provider can control exit points for traffic going to peer networks v Have much less control over where traffic enters peer links access links University of Adelaide 23

Generalized gravity model University of Adelaide 24

Generalized gravity model University of Adelaide 24

Tomographic approach 1 route 3 router 2 route 2 3 y=Ax University of Adelaide

Tomographic approach 1 route 3 router 2 route 2 3 y=Ax University of Adelaide 25

Direct Tomographic approach v Under-constrained problem v Find additional constraints v Use a model

Direct Tomographic approach v Under-constrained problem v Find additional constraints v Use a model to do so u Typical approach is to use higher order statistics of the traffic to find additional constraints v Disadvantage u Complex algorithm – doesn’t scale (~1000 nodes, 10000 routes) u Reliance on higher order stats is not robust given the problems in SNMP data u Model may not be correct -> result in problems u Inconsistency between model and solution University of Adelaide 26

Combining gravity model and tomography 2. tomo-gravity solution 1. gravity solution tomographic constraints (from

Combining gravity model and tomography 2. tomo-gravity solution 1. gravity solution tomographic constraints (from link measurements) University of Adelaide 27

Regularization approach v Minimum Mutual Information: u minimize the mutual information between source and

Regularization approach v Minimum Mutual Information: u minimize the mutual information between source and destination v No information u The minimum is independence of source and destination 3 P(S, D) = p(S) p(D) 3 P(D|S) = P(D) 3 actually this corresponds to the gravity model u Add tomographic constraints: 3 Including additional information as constraints 3 Natural algorithm is one that minimizes the Kullback-Liebler information number of the P(S, D) with respect to P(S) P(D) • Max relative entropy (relative to independence) University of Adelaide 28

Validation v Results good: ± 20% bounds for larger flows v Observables even better

Validation v Results good: ± 20% bounds for larger flows v Observables even better University of Adelaide 29

More results Large errors are in small flows >80% of demands have <20% error

More results Large errors are in small flows >80% of demands have <20% error tomogravity method simple approximation University of Adelaide 30

Robustness (input errors) University of Adelaide 31

Robustness (input errors) University of Adelaide 31

Robustness (missing data) University of Adelaide 32

Robustness (missing data) University of Adelaide 32

Dependence on Topology star (20 nodes) clique University of Adelaide 33

Dependence on Topology star (20 nodes) clique University of Adelaide 33

Additional information – Netflow University of Adelaide 34

Additional information – Netflow University of Adelaide 34

Part III: Applications University of Adelaide 35

Part III: Applications University of Adelaide 35

Applications v Capacity planning u Optimize network capacities to carry traffic given routing u

Applications v Capacity planning u Optimize network capacities to carry traffic given routing u Timescale – months v Reliability Analysis u Test network has enough redundant capacity for failures u Time scale – days v Traffic engineering u Optimize routing to carry given traffic u Time scale – potentially minutes University of Adelaide 36

Capacity planning v Plan network capacities u No sophisticated queueing (yet) u Optimization problem

Capacity planning v Plan network capacities u No sophisticated queueing (yet) u Optimization problem v Used in AT&T backbone capacity planning u For more than well over a year u North American backbone v Being extended to other networks University of Adelaide 37

Network Reliability Analysis v Consider the link loads in the network under failure scenarios

Network Reliability Analysis v Consider the link loads in the network under failure scenarios u Traffic will be rerouted u What are the new link loads? v Prototype used (> 1 year) u Currently being turned form a prototype into a production tool for the IP backbone u Allows “what if” type questions to be asked about link failures (and span, or router failures) u Allows comprehensive analysis of network risks 3 What is the link most under threat of overload under likely failure scenarios University of Adelaide 38

Example use: reliability analysis University of Adelaide 39

Example use: reliability analysis University of Adelaide 39

Traffic engineering and routing optimization v Choosing route parameters that use the network most

Traffic engineering and routing optimization v Choosing route parameters that use the network most efficiently u. In simple cases, load balancing across parallel routes v Methods u. Shortest path IGP weight optimization 3 Thorup and Fortz showed could optimize OSPF weights u. Multi-commodity flow optimization 3 Implementation using MPLS 3 Explicit route for each origin/destination pair University of Adelaide 40

Comparison of route optimizations University of Adelaide 41

Comparison of route optimizations University of Adelaide 41

Conclusion v Properties u. Fast (a few seconds for 50 nodes) u. Scales (to

Conclusion v Properties u. Fast (a few seconds for 50 nodes) u. Scales (to hundreds of nodes) u. Robust (to errors and missing data) u. Average errors ~11%, bounds 20% for large flows v Tomo-gravity implemented u. AT&T’s IP backbone (AS 7018) u. Hourly traffic matrices for > 1 year u. Being extended to other networks http: //www. maths. adelaide. edu. au/staff/applied/~roughan/ University of Adelaide 42

Local traffic matrix (George Varghese) 0% 1% 5% 10% University of Adelaide for reference

Local traffic matrix (George Varghese) 0% 1% 5% 10% University of Adelaide for reference previous case 47