Workload Characterization in Web Caching Hierarchies Guangwei Bai
Workload Characterization in Web Caching Hierarchies Guangwei Bai Carey Williamson Department of Computer Science University of Calgary October 14, 2002 MASCOTS 2002 1
Talk Outline 1. Problem Statement 2. Experimental Methodology 3. Simulation Results 4. Modeling Results 5. Summary and Conclusions October 14, 2002 MASCOTS 2002 2
1. Introduction q World Wide Web: One of the most popular applications on today’s Internet q Web proxy caching: A technique used for improving performance and scalability of the Internet October 14, 2002 MASCOTS 2002 3
Illustration of Web Proxy Cache Filtering Effect Web Server Internet Filtered Request Stream Web Proxy Caching System Original Request Stream …Web Clients… October 14, 2002 MASCOTS 2002 4
Example of Web cache filter effect Arriving Request Stream … Web Proxy Cache Time ID 0. 001 A 0. 025 B 0. 150 C 0. 890 D 1. 358 B 2. 460 E … Time ID 0. 001 A 0. 025 B 0. 150 C 0. 689 A 0. 890 D 1. 358 B 1. 777 B 2. 190 A 2. 460 E Filtered Request Stream October 14, 2002 MASCOTS 2002 5
Example of Web cache filter effect Arriving Request Stream … October 14, 2002 Web Proxy Cache Time ID 0. 001 A 0. 025 B 0. 150 C 0. 890 D 1. 358 B 2. 460 E … Time ID 0. 001 A 0. 025 B 0. 150 C 0. 689 A 0. 890 D 1. 358 B 1. 777 B 2. 190 A 2. 460 E Filtered Request Stream Frequency-domain effect MASCOTS 2002 6
Example of Web cache filter effect Arriving Request Stream … October 14, 2002 Web Proxy Cache Time ID 0. 001 A 0. 025 B 0. 150 C 0. 890 D 1. 358 B 2. 460 E … Time ID 0. 001 A 0. 025 B 0. 150 C 0. 689 A 0. 890 D 1. 358 B 1. 777 B 2. 190 A 2. 460 E Filtered Request Stream Time-domain effect MASCOTS 2002 7
Goal of this Work: Time-domain analysis of cache filter effects in Web caching hierarchies: o Study impact of a cache on the structural characteristics of Web request workload (mean, peak, variance, self-similarity) o Sensitivity of filter effect to cache configuration (cache size and cache replacement policy) o Characterizing aggregate Web request streams in a multi-level Web proxy caching hierarchy October 14, 2002 MASCOTS 2002 8
Multi-Level Web Proxy Caching System 3 Parent Level Web Proxy Cache 1 3 1 Web Proxy Cache 2 2 Web Proxy Cache 3 2 1 October 14, 2002 Child Level MASCOTS 2002 9
Experimental Methodology q Trace-driven simulation Ø Web proxy cache simulator Ø Synthetic Web proxy workloads o Controllable characteristics o Trace length: about 1 M requests o Zipf slope: -0. 75, -0. 8 o Request arrival process: Deterministic, Poisson, Self-Similar October 14, 2002 MASCOTS 2002 10
16000 0. 8 October 14, 2002 16: 00 15: 30 Time (sec) MASCOTS 2002 11 14000 12000 10000 Time (sec) 8000 0. 2 0 6000 0. 4 4000 12000 10000 8000 6000 2000 4000 8000 Cache Hit Ratio 0. 6 2000 12000 0 Hit Ratio 1 12: 30 12: 00 16: 00 15: 30 12: 00 Arrival Counts 20000 0 Requests per 5 -minute Interval q General Observations: Filter Effects
q Effect of Cache Configuration ØExperimental factors: Cache size determines the maximum number of Web Content bytes that can be held in the cache at one time Cache Replacement Policy determines what object(s) to remove from the cache when more space is needed to store an incoming object (e. g. RAND, FIFO, LRU, LFU, GDS) Ø (Assumption: arrival process is Poisson) October 14, 2002 MASCOTS 2002 12
Effect of Cache Size on Traffic Structure Marginal Distribution Plot (pdf) Frequency in Percent 25 20 15 10 5 0 0 20 40 60 80 100 120 Requests per 1 -minute Interval October 14, 2002 (a) Effect of cache size 2002 MASCOTS 13
Effect of Cache Replacement Policy 25 20 Frequency 15 (8 KB) 10 5 0 0 20 40 60 80 100 120 Requests per 1 -minute Arrival October 14, 2002 (b) Effect of cache policy MASCOTS 2002 14
q Input: Deterministic Arrival Process Before Statistics Cache Size (MB) 1 4 16 64 256 1024 60. 00 Mean Standard 0. 00 Deviation 36. 88 31. 45 28. 71 27. 31 25. 37 23. 03 4. 84 4. 60 4. 01 4. 00 4. 31 4. 78 Hit Ratio 38. 8% 47. 8% 52. 7% 55. 5% 59. 1% 62. 7% Main Observations: Ø Reduces mean arrival rate of filtered request stream Ø Increases variance of the filtered request stream October 14, 2002 MASCOTS 2002 15
q Input: Poisson Arrival Process Before Statistics Cache Size (MB) 1 4 16 64 256 1024 60. 10 Mean Standard 7. 82 Deviation 36. 81 31. 38 28. 65 27. 26 25. 33 23. 00 6. 77 6. 07 5. 43 5. 31 5. 39 5. 62 Hit Ratio 38. 8% 47. 8% 52. 7% 55. 5% 59. 1% 62. 7% Main Observations: Ø Large impact on mean; little impact on variance Ø Variance-to-mean ratio increases with cache size Ø For small cache sizes, the filtered stream is well-characterized as a Poisson process.
Input: Self-Similar Arrival Process Before Statistics Cache Size (MB) 1 4 16 64 256 1024 62. 87 Mean Standard 12. 24 Deviation 38. 50 32. 79 29. 88 28. 27 26. 05 23. 49 9. 03 7. 98 7. 12 6. 94 7. 02 7. 14 Hit Ratio 38. 8% 47. 8% 52. 7% 55. 5% 59. 1% 62. 7% Main Observations: Ø Large impact on mean; little impact on variance Ø Variance-to-mean ratio increases with cache size Ø Filtered request stream retains self-similar structure
q Background: Self-Similar Traffic Ø Network traffic self-similarity The statistical characterization of the traffic is essentially invariant with time scale. Ø Main measure Hurst parameter: 0. 5 < H < 1 Ø Examination o autocorrelation (long-range dependence) o variance-time plot o rescaled adjusted range statistic (R/S) October 14, 2002 MASCOTS 2002 18
Traffic Characterization in a Web Proxy Caching Hierarchy q Filter effects of the first-level cache on Web workload q Statistical multiplexing of filtered Web request streams after the first-level cache q Modeling aggregate request stream offered to the second-level cache October 14, 2002 MASCOTS 2002 19
Multi-Level Web Proxy Caching System 3 Parent Level Web Proxy Cache 1 3 1 Web Proxy Cache 2 2 Web Proxy Cache 3 2 1 October 14, 2002 Child Level MASCOTS 2002 20
140 Requests per Interval Synthetic Self-Similar Workload Traces offered to the first-level cache 100 60 20 0 4000 8000 Time (sec. ) 12000 180 140 100 60 20 0 4000 8000 12000 Time (sec. ) Trace 1 Trace 2 (H=0. 70, Zipf slope=0. 75) (H=0. 80, Zipf slope=0. 80) October 14, 2002 MASCOTS 2002 21
40 20 0 0 4000 8000 12000 Time Interval (a) Time Series 0 -1 -2 Autocorrelation 60 1 0. 8 0. 4 0 -0. 4 0 Log 10(R/S) Log 10(Variance) Count of Arrival /Interval ` Evidence of Self-Similar Request Arrival Process for Filtered Web Proxy Workload 1 -3 -4 0 1 2 3 4 Log 10(Aggregation level) (c) Variance-Time Plot 4 3 2 1 0 20 40 60 80 100 Lag (b) Autocorrelation H=0. 699 1 2 3 4 Log 10(Sample Size) (d) R/S Pox Plot
Superposition of Web Workload in time-domain 8 Frequency (%) 3 1 2 6 4 2 0 0 20 40 60 80 100 120 140 Request Arrival Characteristics of aggregate request arrival process October 14, 2002 MASCOTS 2002 3 23
80 40 0 2000 3 0. 8 0. 4 0 -0. 4 6000 10000 Time(sec. ) (a) Time series 0 -1 -2 -3 -4 0 4 2 3 1 Log 10(aggregation level) (c) Variance-Time Plot Log 10(R/S) Log 10(variance) Requests per Interval 120 Autocorrelation Evidence of Self-Similarity for Aggregate Request Arrival Process 0 20 40 60 80 100 Lag (b) Autocorrelation function 4 3 2 1 0 1 H=0. 76 2 4 3 Log 10(sample size) (d) R/S Pox Plot
Modeling of Aggregate Workload • Gamma Distribution f(x) = x-μ ( ) x-μ -1 β ( ) e β β Γ( ) : shape parameter β : scale parameter μ : location parameter October 14, 2002 MASCOTS 2002 25
Modeling of Aggregate Workload October 14, 2002 MASCOTS 2002 26
Summary and Conclusions • Recap: Trace-driven simulation of Web proxy caching hierarchy, with synthetic Web workloads • Cache reduces peak and mean request arrival rate • Cache filter effect does not remove self-similarity • Superposition of Web request streams results in a bursty aggregate request stream • Gamma distribution: a flexible and robust means to characterize request arrival count distribution at different stages in a Web caching hierarchy October 14, 2002 MASCOTS 2002 27
Future Work • Bigger traces, more general workloads • Studying the mathematical relationships between gamma (shape) and beta (scale) parameters versus cache size and hit ratio • For more information: – Email: {bai, carey}@cpsc. ucalgary. ca – http: //www. cpsc. ucalgary. ca/~carey October 14, 2002 MASCOTS 2002 28
- Slides: 28