Origins of Long Range Dependence Myths and Legends

Outline l Definitions l Why is LRD important? l Heavy tails l Producing self-similar

On the Self-Similar Nature of Ethernet Traffic, W. Willinger, 1994

Definitions l Long range dependent process – if its autocorrelation function is nonsummable l

Heavy tails (Noah effect) l Heavy-tailed distributions – LLCD l Pareto a typical example

Producing Self-Similar Traffic 1. Multiplexing ON/OFF sources that have a fixed rate in ON

Questions we want to answer What physical activity causes LRD? l What is the

Statistical Analysis of Ethernet LAN Traffic at the Source Level, W. Willinger, 1997, I

Statistical Analysis of Ethernet LAN Traffic at the Source Level, W. Willinger, 1997, II

Wide Area Traffic: The Failure of Poisson Modeling, V. Paxson, S. Floyd, 1995 l

Explaining WWW Traffic Self-Similarity, M. Crovella, 1995 l WWW traffic is self-similar – but

On the Relationships betw. file sizes, tran. prot. and s-s netw. traffic, M. Crovella,

On the Propagation of LRD in the Internet, A. Veres, 2000, I l Not

On the Propagation of LRD in the Internet, A. Veres, 2000, II l Experimental

TCP Congestion Control and Heavy. Tails, M. Crovella, 2000, I l l Switch to

TCP Congestion Control and Heavy. Tails, M. Crovella, 2000, II Simple Markov chain model

TCP Congestion Control and Heavy. Tails, M. Crovella, 2000, III Pathological TCP connections: 15

On the Autocorrelation Structure of TCP Traffic, Don Towsley, 2000, I l Answer to

On the Autocorrelation Structure of TCP Traffic, Don Towsley, 2000, II Range of time

On the Autocorrelation Structure of TCP Traffic, Don Towsley, 2000, III l !About Veres

Protocols Can Make Traffic Appear Self. Similar, Jon Peha, 1997. I How basic retransmission

Protocols Can Make Traffic Appear Self. Similar, Jon Peha, 1997. II l Pictorial “proof”

Protocols Can Make Traffic Appear Self. Similar, Jon Peha, 1997. III l Cut-off time

The Second-order Characteristics of TCP, J. Y. Boudec, 1996, I l Pseudo self similarity

The Second-order Characteristics of TCP, J. Y. Boudec, 1996, II Even for 34 Mbps

More on RTTs l Why are round trip times heavy-tailed? – Because of TCP

Summary l Heavy-tailed parameters – File sizes – Connection life-times – Inter-arrival packet times

Conclusions l l l One should be careful when attributing the origin of traffic

Slides: 28

Download presentation

Origins of Long Range Dependence Myths and Legends Aleksandar Kuzmanovic 01/08/2001

Outline l Definitions l Why is LRD important? l Heavy tails l Producing self-similar traffic l Physical interpretation in LAN and WAN networks – Different hypothesis from around 10 papers

On the Self-Similar Nature of Ethernet Traffic, W. Willinger, 1994

Definitions l Long range dependent process – if its autocorrelation function is nonsummable l Self-similar process – scaling behavior of finite dimensional distributions l l X=(m^(1 -H))*X(m) in distribution Second order self-similar process – aggregated processes possess the same non-degenerate AC functions as the original process l l X and (m^(1 -H))*X(m) have the same AC function Self-similar processes have hyperbolically decaying autocorrelation functions - LRD can be characterized by a single parameter H

Heavy tails (Noah effect) l Heavy-tailed distributions – LLCD l Pareto a typical example

Producing Self-Similar Traffic 1. Multiplexing ON/OFF sources that have a fixed rate in ON periods and ON/OFF period lengths that are heavy tailed. – Aggregate traffic is f. Bm with 2. queue model – implies that multiplexing constant-rate connections with Poisson connection arrivals and a heavy-tailed distribution for connection lifetimes would result in self-similar traffic 3. Inter-arrival packet times are i. i. d. Pareto with – and then consider the corresponding count process (the number of arrivals in consecutive intervals), we have “pseudo selfsimilar” traffic (Paxson, Floyd) (or even self-similar (L. Lipsky)? )

Questions we want to answer What physical activity causes LRD? l What is the role of protocols (TCP and MAC layer protocols)? l What is the role of limited resources (i. e. bandwidth)? l What model fits best to each of the assumptions? l What is the largest time-scale over which the correlation is present? l Self-similarity vs. pseudo self-similarity and relevance l

Statistical Analysis of Ethernet LAN Traffic at the Source Level, W. Willinger, 1997, I

Statistical Analysis of Ethernet LAN Traffic at the Source Level, W. Willinger, 1997, II Model 1 (heavy tailed ON/OFF activity at the source level) is widely accepted l Result proven theoretically l Noah effect (heavy-tailed periods) l l l ON periods alpha = 1. 7 OFF periods alpha = 1. 2 TCP traffic measured most of the time. . . l Higher load - H increases l WAN measurements do not fit into this model l l connection typically do not stay long

Wide Area Traffic: The Failure of Poisson Modeling, V. Paxson, S. Floyd, 1995 l Summary of ways to produce LRD traffic l WAN (TCP) traffic for TELNET and FTP applications – TELNET connection arrivals appear to be Poisson, but packet arrivals are not – Single TELNET connection is LRD l Model 3: Inter-arrival times are i. i. d. Pareto – Aggregate is also LRD, but there is no analytical proof (*) FTP traffic also LRD, yet non of the models fit because of limited resources. l Aggregated traffic is not f. Bm (single H is not enough) l

Explaining WWW Traffic Self-Similarity, M. Crovella, 1995 l WWW traffic is self-similar – but only when load is high (i. e. in busiest hours) l Authors force model 1 (ON/OFF model) – The distribution of: l l l transfer times (alpha = 1. 21) user requests for documents (alpha = 1. 06) document sizes available in the Web (alpha = 1. 05) user think times (alpha = 1. 5) H increases as the load increases (same as in LAN)

On the Relationships betw. file sizes, tran. prot. and s-s netw. traffic, M. Crovella, 1996 Model 1: The success of this simple model is surprising given that it ignores non-linarities arising in real networks l Hypothesis: l – Heavy tailed file size distributions together with TCP is responsible for LRD l l if UDP is used, there is little or no LRD Explanation – “In some sense, the effect of the unaccounted for nonlinearity is reflected back as a stretching in time effect, thus conforming to the model’s original suppositions” l Other interesting stuff: mix of Pareto and exp. background traffic

On the Propagation of LRD in the Internet, A. Veres, 2000, I l Not about roots, but about propagation of selfsimilarity by TCP A(t) = C - B(t) l TCP is a linear system beyond a characteristic time scale l – if it adapts well to a background traffic, it itself becomes self-similar

On the Propagation of LRD in the Internet, A. Veres, 2000, II l Experimental proof: – NY-Budapest file transfer, source is not LRD - traffic is LRD (H=0. 76) – Max time scale = 8 min Also, if there is number of on-off TCP connections, they can spread LRD l W. Willinger obviously does not like this paper: l – “This is a fraud and has no relevance for LRD observed on link level. . . ” – “Protocols have no impact on LRD, they just have to send the data generated by applications. . . ”

TCP Congestion Control and Heavy. Tails, M. Crovella, 2000, I l l Switch to Model 3 (Heavy-tailed inter-packet arrivals) Although heavy-tailed flow lengths are commonly associated with heavy-tailed file sizes, there is no strong correlation between file sizes and transmission times It has been shown that TCP can show heavy-tailed interarrival times under some conditions Because most of the connections are short lived (!) only slow start and exp. back-off were considered

TCP Congestion Control and Heavy. Tails, M. Crovella, 2000, II Simple Markov chain model for exp. backoff and slow start with pr. of loss parameter l State probability with different loss rates l For alpha to be between 1 and 2, p has to be between 1/8 and 1/4 l . . . but for different model l p increases => H increases

TCP Congestion Control and Heavy. Tails, M. Crovella, 2000, III Pathological TCP connections: 15 packets l Analytical model not that good (borders are loose) l For this set-up, correlation up to 1000 sec l For larger file sizes, up to 200 -300 sec l Under certain conditions, heavy tailed transmission times can occur even in the absence of any variability in file sizes l Future work: to consider the variability in round-trip time estimation l

On the Autocorrelation Structure of TCP Traffic, Don Towsley, 2000, I l Answer to previous two papers: – TCP can create self-similarity but over finite range of time scales - “pseudo self similarity” l but everything in nature is finite (thus “pseudo”) – Also criticize pathological model of previous paper, but they themselves use pathological model of different kind (always packets model) Separate Markovian models for Congestion avoidence (CA) and Time Out (TO) models l Simulated these two models with different loss probability parameters l

On the Autocorrelation Structure of TCP Traffic, Don Towsley, 2000, II Range of time scales observed from the simulation (2^6*RTT*(2. 5 to 10)) => 2^9*RTT l Explanation on why aggregate is self-similar l – independent bottlenecks (at the edge) – aggregate of independent pseudo-self-similar flows should be self-similar itself (**)

On the Autocorrelation Structure of TCP Traffic, Don Towsley, 2000, III l !About Veres paper – compute loss probability (0. 08 to 0. 14) – TO model predicts H=0. 69 -0. 72 (really measured 0. 74) – Time scale goes up to 2^6 RTO (also near measured value) l Experiments (file transfers) – North-South America l l Measurements: p = 0. 13, H = 0. 77, ts = (2^7 to 2^8)*RTT TO model: p = 0. 12, H = 0. 72, ts = (2^7 to 2^9)*RTT – East - West Coast l l l Measurements: p = 0. 018, H = 0. 86, ts = 2^6*RTT CA model: p = 0. 018, H = 0. 75, ts = 2^4*RTT One should be careful when attributing the origin of traffic characteristics to a specific cause

Protocols Can Make Traffic Appear Self. Similar, Jon Peha, 1997. I How basic retransmission mechanism can cause selfsimilarity l No model, only experimental investigation l Simple single queue (bottleneck) model l Input traffic - Poisson; retransmissions are bursty l As time-scale gets larger, burstiness from original Poisson traffic decreases, but burstiness from retransmissions stays the same! l Unlikely that traffic from retransmission mechanism cause truly self similar traffic, rather pseudo selfsimilarity l

Protocols Can Make Traffic Appear Self. Similar, Jon Peha, 1997. II l Pictorial “proof”

Protocols Can Make Traffic Appear Self. Similar, Jon Peha, 1997. III l Cut-off time scales observed: – 150 Mbps link rate, 500 bits packets, RTT 60 msec l TS = 5 minutes – 10 Mbps Ethernet, No. of retransmissions=5, To=125 l TS in range of minutes – For larger To, it is possible to reach time scales measured at Bellcore – I have computed cut-off time-scale for Veres paper l l 128 Kbps, Tout=10*RTT=2 sec, TS=8 min If this effect is found to be as strong in more complex models, this could be a significant cause

The Second-order Characteristics of TCP, J. Y. Boudec, 1996, I l Pseudo self similarity (TS=20 -30 sec) – Minimum bottleneck bandwidth 34 Mbps (? ) l Two main reasons (both heavy-tailed) – Burst length arrivals – Round trip time Real network measurements l Figure - missing l

The Second-order Characteristics of TCP, J. Y. Boudec, 1996, II Even for 34 Mbps link and utilization of 25%, the arrival bursts are eliminated and the inter packet times are dependent on the round trip times l The aggregate of TCP connections have the same H as a single TCP connection (***) l “It seems likely that the heavy tailed distributions observed in Willinger’s work were a result of, among other things, the heavy tailed distribution of a round trip time” l

More on RTTs l Why are round trip times heavy-tailed? – Because of TCP congestion control? – Because of retransmissions? – Because of variety of destinations? l It can be heavy-tailed even without any congestion protocol or different destinations! – Measurement and Analysis of LRD Behavior of Internet Packet Delay, M. Borella, Infocom 97 l l l Constant UDP transmissions - LRD response Is cross-traffic heavy-tailed? Or multiple bottlenecks assumption? – Simple example (not through bandwidth adaptation, but through RTT adaptation)

Summary l Heavy-tailed parameters – File sizes – Connection life-times – Inter-arrival packet times – Document sizes available in the web – User think times – TELNET packet arrivals – Round trip times l Pseudo self-similarity – it should be clear that the range of time scales covered is far beyond dominant time scales, and as long as packet loss is concerned, this is relevant

Conclusions l l l One should be careful when attributing the origin of traffic characteristics to a specific cause There is more than one physical activity causing LRD Protocols (TCP) influence is more than relevant – Time scales covered are relevant in both generation, timestretching and propagation hypothesis l l l Model 3 (inter-arrival times i. i. d. Pareto) plus heavy-tailed file sizes (introducing congestion) is promising Analytical proof for aggregate is missing (simulation proof reported in 3 papers) Round-trip times hypothesis might be promising - supports Veres idea in a slightly different way