Simulation Evaluation of a Heterogeneous Web Proxy Caching
Simulation Evaluation of a Heterogeneous Web Proxy Caching Hierarchy Mudashiru Busari Williamson University of Saskatchewan Calgary Carey University of MASCOTS 2001 1
Introduction z“The Web is both a blessing and a curse…” z. Blessing: y. Internet available to the masses y. Seamless exchange of information z. Curse: y. Internet available to the masses y. Stress on networks, protocols, servers, users z. Motivation: improve the performance and scalability of the Web (e. g. , caching) 2
Example of a Web Proxy Cache Web server Proxy server Web Client Web Client 3
Our Previous Work z Evaluation of Canada’s national Web caching infrastructure for CANARIE’s CA*net II backbone z Workload characterization and evaluation of CA*net II Web caching hierarchy (IEEE Network, May/June 2000) z Developed Web proxy caching simulator for tracedriven simulation evaluation of Web proxy caching architectures z Developed synthetic Web proxy workload generator called Pro. WGen [Busari/Williamson INFOCOMM 2001] 4
CA*net II Web Caching Hierarchy (Dec 1998) (selected measurement points for our traffic analyses; 6 -9 months of data from each) USask CANARIE (Ottawa) To NLANR 5
Caching Hierarchy Overview Cache Hit Ratios 5 -10% Proxy Top-Level/International (empirically (20 -50 GB) observed) Proxy National 15 -20% Proxy (10 -20 GB) 30 -40% Proxy C C Proxy Regional/Univ. (5 -10 GB) Proxy C C . . . C 6
Some Observations on Multi-Level Caching. . . z. Caching hierarchy not very effective z. Reason: workload characteristics change as you move up the caching hierarchy (due to filtering effects, etc) z. Idea #1: Try different cache replacement policies at different levels of hierarchy z. Idea #2: Limit replication of cache content in overall hierarchy through “partitioning” (size, type, sharing, …) 7
Research Questions: Multi-Level Caches z In a multi-level caching hierarchy, can overall caching performance be improved by using different cache replacement policies at different levels of the hierarchy? z In a multi-level caching hierarchy, can overall performance be improved by keeping disjoint document sets at each level of the hierarchy? 8
Experimental Methodology z. Trace-driven simulation z. Multi-factor experimental design z. Cache size y 1 MB to 32 GB z. Cache Replacement Policy y. Least-Recently-Used (currently active docs) y. Least-Frequently-Used (popular docs) y. Greedy-Dual-Size (favours smaller docs) z. Workload Characteristics y. Degree of overlap amongst child caches 9
Simulation Model Web Servers Complete Overlap Partial Overlap (50%) Proxy server Upper Level (Parent) Proxy server Lower Level (Children) No Overlap Web Clients 10
Web Proxy Workload Used z. Synthetically generated workload using Pro. WGen proxy workload generator [Busari/Williamson INFOCOMM 2001] z. Parameterized based on empirical data z. Zipf-like document popularity profile z. Lots of “one-timer” documents z. Heavy-tailed file size distribution z. Note: static content only 11
Workload Characteristics Parameter Total number of requests Unique documents (of total requests) One-timers (of unique documents) Zipf slope Tail Index Documents in the tail Beginning of the tail (bytes) Mean of the lognormal file size distribution Standard deviation Correlation between file size and popularity LRU Stack Model for temporal locality LRU Stack Size Value 5, 000 34% 72% 0. 807 1. 322 22% 10, 000 7, 000 11, 000 Zero Static and Dynamic 1, 000 12
Zipf-like Referencing Behaviour Empirical Trace Slope = 0. 81 Synthetic Trace Slope = 0. 83 13
Performance Metrics z. Document Hit Ratio y. Percent of requested docs found in cache (HR) z. Byte Hit Ratio y. Percent of requested bytes found in cache (BHR) Notes: - application-level simulation (files), not network-level (pkts) - all three caches always identical in size 14
Experiment 1: Different Policies at Different Levels of the Hierarchy (Complete Overlap) Children Parent (a) Hit Ratio Children Parent (b) Byte Hit Ratio 15
Children Parent 16
Experiment 2: Sensitivity to Workload Overlap z. The greater the degree of workload overlap amongst the child proxies, the greater the role for the parent cache z. In the “no overlap” scenario, the parent cache has negligible hit ratios, particularly when child caches are large 17
Experiment 3: Size-based Partitioning z. Partition files across the two levels of the hierarchy based on size (e. g. , keep small files at the lower level and large files at the upper level) (or vice versa) z. Three size thresholds for “small”. . . y 5, 000 bytes y 100, 000 bytes 18
Small files at the lower level; Large files at the upper level Children Parent Size threshold = 5, 000 bytes Size threshold = 10, 000 bytes 19
Large files at the lower level; Small files at the upper level Parent Children Size threshold = 5, 000 bytes Size threshold = 10, 000 bytes 20
Summary: Multi-Level Caches z Different Policies at different levels y. LRU/LFU-Aging at the lower level + GD-Size at the upper level provided improvement in performance y. GD-Size + GD-Size provided better performance in hit ratio, but with some penalty in byte hit ratio z Size-threshold approach ysmall files at the lower level + large files at the upper level provided improvement in performance yreversing this policy offered no perf advantage 21
Conclusions z. Pro. WGen is a valuable tool for the evaluation of Web proxy caching architectures, using synthetic workloads z. Existing multi-level caching hierarchies are not always that effective z“Heterogeneous” caching architectures may better exploit workload characteristics and improve Web caching performance 22
Future Work z. Extend and improve Pro. WGen z. Use of packet-level simulations to understand protocol/network-level effects z. Port Pro. WGen to network emulation testbed at the U of Calgary 23
For More Information. . . z. M. Busari, “Simulation Evaluation of Web Caching Hierarchies”, M. Sc. Thesis, Dept of Computer Science, U. Saskatchewan, June 2000 z. Pro. WGen tool: yhttp: //www. cs. usask. ca/faculty/carey/software/ z. Email: carey@cpsc. ucalgary. ca yhttp: //www. cpsc. ucalgary. ca/~carey/ 24
- Slides: 24