A Resourceminimalist Flow Size Histogram Estimator Bruno Ribeiro
A Resource-minimalist Flow Size Histogram Estimator Bruno Ribeiro, Don Towsley UMass Amherst Bruno Ribeiro, Tao Ye, Don Towsley, Tao Ye Sprint "A Resource-minimalist flow size histogram estimator"
Flow size histogram r Flow size ¦ r Internet core router: TCP flows e. g. # of packets TCP flow Flow size histogram used: Traffic profiling ¦ Anomaly detection ¦ r Histogram hard to obtain ¦ TCP flows: ð r Hundreds of millions flows/hour (OC-48 router) Estimating flow size histograms ¦ Random packet sampling is inaccurate [Ribeiro et al. 2006] ¦ Flow sampling: more memory & accurate tail needs packet sampling ¦ Current data streaming methods have slow estimators Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator" 2
Outline r Related work r Our resource-minimalist approach r Experiment r Conclusions Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator" 3
Related work [Kumar et al. 2004] Router Packet hash collision!! Universal hash function Flow size histogram 0 1 2 1 0 2 0 0 counters Sketch phase Estimation phase (powerful backend server) hash collisions Complexity: O( (maximum flow size)3 Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator" ) 4
Resource-minimalist Approach r Insight: Don’t need to count every flow size ¦ Idea: Group large flow sizes into bins ð ð ¦ Approach: Probablistic counting ð r Fine grained flow histogram < k packets Coarse grained flow histogram > k packets Reduces counters to 6 bits ¦ Requires: Low collision probability (e. g. counter/flow = 2/1) ¦ Result: O(k 3 + log(W)) estimator, e. g. , k=16 and W=107 Problem: Low collision → more memory (2 counters / flow) ¦ Approach: Counter folding ð ð ¦ Negligible increase in estimator error Requires one extra bit / counter Result: Reduces number of counters by half Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator" 5
Group large flow sizes & Probabilistic counting [Morris 78] Counter increments (probabilisitc): Hash counter p=1/m 21 k-1 k+2 2 k 1 0 Arrived packets: … k-1 … k m 1 … m 2 average Counter value k → flow sizes = [k, k+m 1 -1] q Counter value k+1 → flow sizes = [k+m 1, k+m 1+m 2 -1] q With ma = 2ª , 6 bit counter bins up to W=1014 Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator" 6
Counter folding: Detecting some collisions q Maximum q M/2 hash value = M counters ¦ If hash(packet) < M/2 → red ¦ Otherwise (hash(packet) mod M/2) → blue Detectable blue – red Flows: Counters: 6 collision: 1 bit required flow 7 flow 8 flow 9 Undetectable collision 1 2 0 2 1 6 0 0 M/2 counters Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator" 7
Counter folding Collision policy: q “red flow cannot increment blue counter” Flows: q “blue Counters: 6 1 2 0 2 1 3 0 0 flow overwrites red counter” q counter = 0 are red Counter colors: 1 1 0 0 (extra bit) Result: e. g. if 1 counter / flow q All red counters are also blue counters = 0 m Virtually expands hash table in ≈ 50% (virtual 2 counters/ flow) q Blue counters evict red counters ¦ Flow sampling effect: Discards 15% flows at random Folding: interesting fact Flow sampling Bruno Ribeiro, Tao Ye, Don Towsley, À1 Number of foldings Policy: Evict newest flow (color = flow ID) "A Resource-minimalist flow size histogram estimator" 8
Experiment Evaluated with simulations r Our worst result with Internet core traces Same accuracy without counter folding requires 13 MB of memory r 9. 5 million flows ¦ 8 MB of memory ¦ k=16 ¦ W=1014 ¦ k Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator" 9
Conclusions Insights r Group large flow sizes using probabilistic counters r Counter folding r Fast quasi-random sampling Our Estimator r Time complexity ¦ Sketch phase Universal hash cost ð Two additions ð One subtraction ð ¦ Estimation phase ð r O(k 3 + log(W)) Space complexity ¦ ≈ 1/4 memory usage of [Kumar et al. 2004] Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator" 10
- Slides: 10