On Filter Effects in Web Caching Hierarchies Carey

On Filter Effects in Web Caching Hierarchies Carey Williamson Department of Computer Science University of Calgary 1

Introduction z“The Web is both a blessing and a curse…” z. Blessing: y. Internet available to the masses y. Seamless exchange of information z. Curse: y. Internet available to the masses y. Stress on networks, protocols, servers, users z. Motivation: techniques to improve the performance and scalability of the Web 2

Why is the Web so slow? z. Client-side bottlenecks (PC, modem) y. Solution: better access technologies z. Server-side bottlenecks (busy Web site) y. Solution: faster, scalable server designs z. Network bottlenecks (Internet congestion) y. Solutions: caching, replication; improved protocols for client-server communication 3

Example of a Web Proxy Cache Web server Proxy server Web Client Web Client 4

Our Previous Work z Evaluation of Canada’s national Web caching infrastructure for CANARIE’s CA*net II backbone z Workload characterization and evaluation of CA*net II Web caching hierarchy (IEEE Network, May/June 2000) z Developed Web proxy caching simulator for tracedriven simulation evaluation of Web proxy caching architectures z Developed synthetic Web proxy workload generator called Pro. WGen [Busari/Williamson INFOCOMM 2001] 5

CA*net II Web Caching Hierarchy (Dec 1998) (selected measurement points for our traffic analyses; 6 -9 months of data from each) USask CANARIE (Ottawa) To NLANR 6

Caching Hierarchy Overview Cache Hit Ratios 5 -10% Proxy Top-Level/International (empirically (20 -50 GB) observed) Proxy National 15 -20% Proxy (10 -20 GB) 30 -40% Proxy C C Proxy Regional/Univ. (5 -10 GB) Proxy C C . . . C 7

Some Observations on Multi-Level Caching. . . z. Caching hierarchy not very effective, due to a “diminishing returns” effect z. Reason: workload characteristics change as you move up the caching hierarchy (due to filtering effects, etc) z. Bigger caches aren’t really the answer z. Better caching system design might be. . . 8

Research Goals z. Develop better understanding of cache filter effects (intuitively, quantitatively) z. Try to do something about it! z. Idea #1: Try different cache replacement policies at different levels of hierarchy z. Idea #2: Try partitioning cache content in overall hierarchy based on size or type to limit replication, etc. 9

Talk Overview z. Background/Motivation z. Understanding Cache Filtering Effects z. Exploiting Cache Filtering Effects z. Summary and Conclusions 10

Part I: Understanding Cache Filter Effects 11

Simulation Model Web Servers Proxy server Upper Level (Parent) Proxy server Lower Level (Children) Web Clients 12

Experimental Methodology z. Trace-driven simulation (empirical traces) z. Multi-factor experimental design z. Cache size y 1 MB to 32 GB z. Cache Replacement Policy y. Recency-based LRU (currently active docs) y. Frequency-based LFU-Aging (popular docs) y. Size-based GD-Size (favours smaller docs) z. Analyze workload characteristics 13

Web Workload Characteristics z“One-timers” (60 -70% docs are useless!!!) z. Zipf-like document referencing popularity z. Heavy-tailed file size distribution (i. e. , most files small, but most bytes are in big files) z. Zero correlations between document size and document popularity (debate!) z. Temporal locality (temporal correlation between recent past and near future references) [Mahanti et al. PER 2000] 14

Zipf-Like Referencing z. An intrinsic “power-law” relationship in the way that humans organize, access, and use information (e. g. , library books, English words in text, movie rentals, Web sites, Web pages, . . . ) z. Plot item popularity versus relative rank, on a log-log scale, results in straight line 15

Example: Zipf-Like Document Popularity Profile for Uof. S Trace 16

Quiz Time: What do you get AFTER the cache? 17

Quiz Time: What do you get AFTER the cache? 18

Quiz Time: What do you get AFTER the cache? (a) 19

Quiz Time: What do you get AFTER the cache? (a) (b) 20

Quiz Time: What do you get AFTER the cache? (a) (b) (c) 21

Quiz Time: What do you get AFTER the cache? (a) (c) (b) (d) 22

Quiz Time: What do you get AFTER the cache? Answer: (c) 23

Simulation Results for Input Workload Traces with Different Initial Zipf Slopes 24

The Magnitude of the Filter Effect Depends on Cache Size 25

Filter Effect Depends on Cache Replacement Policy 26

Filter Effect is Most Pronounced at First-Level Cache 27

Part II: Exploiting Cache Filter Effects 28

Research Questions: Multi-Level Caches z In a multi-level caching hierarchy, can overall caching performance be improved by using different cache replacement policies at different levels of the hierarchy? z In a multi-level caching hierarchy, can overall performance be improved by keeping disjoint document sets at each level of the hierarchy? 29

Simulation Model Web Servers Complete Overlap Partial Overlap (50%) Proxy server Upper Level (Parent) Proxy server Lower Level (Children) No Overlap Web Clients 30

Performance Metrics z. Document Hit Ratio y. Percent of requested docs found in cache (HR) z. Byte Hit Ratio y. Percent of requested bytes found in cache (BHR) 31

Experiment 1: Different Policies at Different Levels of the hierarchy Children Parent (a) Hit Ratio Children Parent (b) Byte Hit Ratio 32

Children Parent 33

34

Experiment 2: Sensitivity to Workload Overlap z. The greater the degree of workload overlap amongst the child proxies, the greater the role for the parent cache z. In the “no overlap” scenario, the parent cache has negligible hit ratios, particularly when child caches are large 35

36

37

38

Experiment 3: Size-based Partitioning z. Partition files across the two levels of the hierarchy based on size (e. g. , keep small files at the lower level and large files at the upper level) (or vice versa) z. Three size thresholds for “small”. . . y 5, 000 bytes y 100, 000 bytes 39

Small files at the lower level; Large files at the upper level Children Parent Size threshold = 5, 000 bytes Size threshold = 10, 000 bytes 40

Large files at the lower level; Small files at the upper level Parent Children Size threshold = 5, 000 bytes Size threshold = 10, 000 bytes 41

Summary: Multi-Level Caches z Different Policies at different levels y. LRU/LFU-Aging at the lower level + GD-Size at the upper level provided improvement in performance y. GD-Size + GD-Size provided better performance in hit ratio, but with some penalty in byte hit ratio z Size-threshold approach ysmall files at the lower level + large files at the upper level provided improvement in performance yreversing this policy offered no perf advantage 42

Conclusions z. Existing multi-level caching hierarchies are not always that effective, due to cache filtering effects z“Heterogeneous” caching architectures may better exploit workload characteristics and improve Web caching performance 43

For More Information. . . z. M. Busari, “Simulation Evaluation of Web Caching Hierarchies”, M. Sc. Thesis, Dept of Computer Science, U. Saskatchewan, June 2000 z. C. Williamson, “On Filter Effects in Web Caching Hierarchies”, ACM Transactions on Internet Technology, 2002 (to appear). z. Email: carey@cpsc. ucalgary. ca yhttp: //www. cpsc. ucalgary. ca/~carey/ 44