Performance Evaluation of URL Routing for Content Distribution
Performance Evaluation of URL Routing for Content Distribution Networks Ph. D defense by Zornitza Genova Prodanoff Committee Members: Dr. K. J. Christensen (Major Professor) Dr. M. Varanasi Dr. R. Perez Dr. Chari Dr. Labrador This material is based upon work funded by the National Science Foundation under grant no. 9875177 ZGP 001 (zphddef. ppt - 07/15/03)
Acknowledgements I would like to thank: My major professor Dr. Ken Christensen, My committee: Dr. Varanasi, Dr. Perez, Dr. Chari, and Labrador Dr. Suen for his comments at my proposal defense My colleagues: K. Yoshigoe, A. Aslam, G. Perrera, and J. Shahbazian My family ZGP 002 Dr.
Topics • Motivation • Problem and contributions • URL Routing • Improvements to URL routing • Evaluation of URL signatures New • Evaluation of hashing for URL routing • Summary • List of my publications ZGP 003 New
Motivation “… 2. 5 Billion Hours Spent Waiting on the Web in 1998. ” - John Roth, chief executive of Nortel Networks at Telecom '99 ZGP 004
Problem and contributions Problem: Excessive delay in the Internet caused by the inability to efficiently access distributed content in the Web My contributions: 1) Architected a new URL router that uses HTTP redirection 2) Investigated new use of CRC 32 for reducing the size of routing tables 3) Investigated a new self-adjusting hashing method for faster URL routing look-up 4) Performed the first queuing evaluation of hashing - effects of correlation discovered ZGP 005
Topics • Motivation • Problem and contributions • URL Routing • Improvements to URL routing • Evaluation of URL signatures • Evaluation of hashing for URL routing • Summary • List of my publications ZGP 006
URL routing • Next generation Internet - Content Distribution Networks - A CDN is an overlay network on the Internet - A CDN co-locates content throughout the world • CDNs are of a great commercial and research interest - $15 million in NSF funding for Web services research - Akamai is one major CDN provider ZGP 007
URL routing continued Global content distribution in a CDN http: //214. 29. 2. 15/page Transparent cache Reverse cache Origin site Proxy cache http: //www. some. com/page Clients Internet Distributed server http: //334. 249. 2. 8/page ZGP 008
URL routing continued HTTP redirection in a CDN Proxy cache Clients Reverse cache Origin site (1) (2) URL router Distributed server (1) HTTP request and redirect (2)HTTP re-request and response ZGP 009
URL routing continued Architecture of a new URL router One armed URL router HTTP requests and redirects Network links Layer 3 switch URL 1 Loc 1 (state), loc 2 (state), … loc M 1 (state) URL 2 Loc 1 (state), loc 2 (state), … loc M 2 (state) … … URL N ZGP 010 Routing Table Loc 1 (state), loc 2 (state), … loc MN (state)
URL routing continued Need to exchange routing tables (digesting) § § ZGP 011 Summary Cache [17] – Use Bloom filters to “merge” routing (hash) tables Bloom filter is probabilistic and does not support updates - False positives if non-unique hashes - Results in a “routing collision” in the context of URLs
URL routing continued Need to do look-ups in routing tables • Why use hashing? – Build routing tables as hash tables for efficient look-up • Idea of self–adjusting hash – Most frequently used keys are closer to the head » If chained hashing: rearrange after key accesses » Transposition rule for lists [50], [7] » Move-to-front rule for lists [33] • Review of H 1 hashing [74] – Self-adjusting by using transposition ZGP 012
URL routing continued Chained resolution of hash table collision index chain key record k 0 r 0 k 1 k 2 r 1 r 2 2 … … kn-1 ZGP 013 rn-1 m-1 rs rn-1 The hashing collision at index 0 causes the chain to be created
URL routing continued H 1 and Simple hashing algorithms based on [37] C 1. [Create lists] For i 0 to m-1 set LISTi NULL. C 2. [Hash] Set i h(KEY), j 0 C 3. [Is there a list? ] If LISTi = NULL, go to C 6. C 4. [Compare] If K = LISTi[j], terminate C 5. [Advance to next] If LISTi[j] NULL, set j j+1 and go to step C 4. C 6. [Insert new key] Set LISTi[j] KEY. C 4 A. [Compare and transpose – H 1 hashing] If K = LISTi[j] and j 0, swap LISTi[j] with LISTi[j-1] and terminate Else terminate ZGP 014
URL routing continued Now begin my contributions in digesting and hashing (and evaluation thereof) ZGP 015
Topics • Motivation • Problem and contributions • URL routing • Improvements to URL routing • Evaluation of URL signatures • Evaluation of hashing for URL routing • Summary • List of my publications ZGP 016
Improvements to URL routing Open problems 1) Select best source based on state (and location of client) 2) Reduce the size of the routing table to update/share 3) Perform fast routing look-ups My problems ZGP 027
Improvements to URL routing continued • My idea… − Use CRC 32 for URL signatures • CRC 32 circuitry is already part of an Ethernet adapter – Serial shift-register with wrapped XOR terms • Use to get CRC 32 signatures for URL in HTTP request header • Need to calculate a CRC 32 over a subfield [53] – The subfield is the URL in an HTTP request header ZGP 018
Improvements to URL routing continued Define the following, – P is CRC 32 generator polynomial – Ai, i = 1, …, m is a polynomial (bit sequence) Packet header Subfield A 0 A 2 Rest of packet A 1 – We store in a table (for all possible M) the remainders… , where M is length of subfield ZGP 019
Improvements to URL routing continued We have the following, Returned by adapter - from CRC 32 shift register What we want (CRC 32 for subfield) ZGP 020
Improvements to URL routing For ZGP 021 continued the following properties apply:
Improvements to URL routing continued Solve for RA as follows… 2 Let A 3 be A 0 shifted left M bits. Then 32 -bit multiply and. ZGP 022
Improvements to URL routing continued • My idea… − Aggressive hashing to perform fast look-up » Self-adjusting chained collision resolution » Fast way to do hash table look-ups » Based on move-to-front rule for lists [33], [50] ZGP 023
Improvements to URL routing continued The new Aggressive hashing algorithm C 1. [Create lists] For i 0 to m-1 set LISTi NULL. C 2. [Hash] Set i h(KEY), j 0 C 3. [Is there a list? ] If LISTi = NULL, go to C 6. C 4. [Compare] If K = LISTi[j], terminate C 5. [Advance to next] If LISTi[j] NULL, set j j+1 and go to step C 4. C 6. [Insert new key] Set LISTi[j] KEY. C 4 B. [Compare and move-to-front – Aggressive hashing] If K = LISTi[j] and j 0 LISTi[j] TEMP, for k = 0 to j LISTi[k] LISTi [k-1]. Terminate. Else terminate. New
Topics • Motivation • Problem and contributions • URL routing • Improvements to URL routing • Evaluation of URL signatures • Evaluation of hashing for URL routing • Summary • List of my publications ZGP 025
Evaluation of URL signatures Evaluation done with trace-driven simulation Response variables: 1) Probability of false hits due to signature collisions 2) CPU time required to generate URL signatures 3) Reduction in processing and memory resources for URL look-up ZGP 026
Evaluation of URL signatures continued Input data used in the evaluation: § Obtained lists of URLs from 9 cache and server HTTP logs – Access lists generated by me – URL lists – CRC 32 lists § Unique URLs range from 70 to 2. 5 million (1. 5 to 146 MBytes) § Continuity of logs was in months § Full URL string or CRC 32 signatures lists were built 2. 1 GBytes of ASCII format raw data was used ZGP 027
Evaluation of URL signatures continued Input data characteristics Access list name Number accesses Number URLs Mean URL length (B) Full URL list size (bytes) CRC 32 list size (bytes) www. peak. org 16, 374 70 23. 93 1, 675 280 SDMA 41, 941 153 33. 76 5, 165 612 UVA 318, 899 45, 816 44. 91 2, 057, 625 183, 264 NLANR 944, 028 504, 967 58. 44 29, 510, 135 2, 019, 868 UC Berkeley 1, 791, 349 149, 344 41. 87 6, 253, 716 597, 376 mcs. net 1, 862, 070 75, 361 29. 87 2, 250, 829 301, 444 hyperreal. org 4, 080, 590 86, 338 89. 17 7, 698, 337 345, 352 CA*net. II 4, 642, 861 2, 552, 045 57. 83 147, 573, 556 10, 208, 184 USF CSEE 8, 819, 454 49, 029 51. 84 2, 541, 483 196, 116 ZGP 028
Evaluation of URL signatures continued Experiments on the performance of CRC 32 • Experiment #1: Number of CRC collisions was measured – CRC 32 generated for each URL – Non-unique CRC 32 s counted • Experiment #2: Measured CPU time to generate CRC 32 URL list – Software CRC generation (8 -bit look-up coded in “C”) • Experiment #3: Measured CPU time required for look-up – All entries from access list were looked up in URL list – URL list is a Simple chained hash table ZGP 029
Evaluation of URL signatures continued Results for experiment #1 www. peak. org Collisions Measured 0 Calculated value 0 Pr[collision] measured 0. 0000000 SDMA 0 0 0. 0000000 UVA 0 1 0. 0000000 NLANR 68 59 0. 0001347 UC Berkeley 2 5 0. 0000134 mcs. net 0 1 0. 0000000 hyperreal. org 2 2 0. 0000463 CA*net. II 1558 1516 0. 0006105 USF CSEE 2 1 0. 0000408 Access list name Measured and theoretical are close ZGP 030
Evaluation of URL signatures continued Results for experiment #2 Access list www. peak. org Time for URL list Time for URL <10 millisec -- SDMA <10 -- UVA 40 NLANR 460 0. 9109 UC Berkeley 100 0. 6695 mcs. net 40 0. 5307 hyperreal. org 120 1. 3897 CA*net. II 2390. 0. 9368 USF CSEE 40 0. 8158 0. 8730 sec Time per URL string is small ( sec) ZGP 031
Evaluation of URL signatures continued Results for experiment #3 Look-up time (sec) 0. 6 0. 5 0. 4 0. 3 0. 2 Full URL 0. 1 0 CRC 32 URL signatures 10 12 14 16 18 20 H value CRC 32 URL signature is better ZGP 032 22
Evaluation of URL signatures continued Experiments for CRC 32 vs. MD 5 -Bloom filter digesting • Experiment #1: Measured digest size and generation CPU time – MD 5 -Bloom filter – CRC 32 – 32 -bit checksum – Lempel-Ziv (LZ) compression (used pkzip 25) • Experiment #2: Measured digest size and CPU time – MD 5 -Bloom • Experiment #3: Measured collisions – Control variable is URL length – MD 5 -Bloom vs. CRC 32 – URL length is a maximum of 25, 30, …, 80 bytes ZGP 033
Evaluation of URL signatures continued Experiments for CRC 32 vs. MD 5 -Bloom filter digesting ( continued) • Experiment #4: Measured digest size of the hash chain method – Based on the number of components – Tree structure of 32 bits for a <depth, hash code> pair ZGP 034
Evaluation of URL signatures continued Results for experiments #1 and #2 CA*net list CSE list MD 5 -Bloom (8) CPU time (sec) 89. 13 CRC 32 16. 22 9. 74 0. 03 0. 27 0. 19 0. 00 32 -bit checksum 14. 85 9. 74 0. 71 0. 24 0. 19 0. 22 LZ compression 17. 35 16. 43 0. 00 0. 23 0. 25 0. 00 MD 5 -Bloom (8) 89. 13 9. 74 0. 03 1. 63 0. 19 0. 00 MD 5 -Bloom (16) 92. 37 19. 47 0. 00 1. 71 0. 37 0. 00 MD 5 -Bloom (32) 97. 40 38. 94 0. 00 1. 84 0. 75 0. 00 Method (Load Factor) Size (Mbytes) Collision s (%) 9. 74 0. 03 CPU time (sec) 1. 63 Size (Mbytes) Collision s (%) 0. 19 0. 00 Similar CRC 32 and Bloom filter collisions ZGP 035
Evaluation of URL signatures continued Results for experiment #3 Collisions (%) 0. 10 0. 01 MD 5 -Bloom CRC 32 0. 00 25 35 45 55 65 75 URL length (bytes) Collisions are same for CRC 32 and Bloom filter ZGP 036
Evaluation of URL signatures continued Results from experiment #4 • Hash chaining in an average of 212% larger digests than CRC 32 Substantially larger then the other methods ZGP 037
Evaluation of URL signatures continued Discussion of results • CRC 32 URL signatures reduce the size of URL lists and speed-up look-up in a hash table – Require less network bandwidth to transfer – Require less memory for storage in the URL router • For CRC 32 the number of collisions was found to be small • CRC 32 digests require less CPU and produce same collisions ZGP 038
Topics • Motivation • Problem and contributions • URL routing • Improvements to URL routing • Evaluation of URL signatures • Evaluation of hashing for URL routing • Summary • List of my publications ZGP 039
Evaluation of hashing for URL routing continued Look-up time experiments: • Experiment #1: Effect of hash table size on look-up time (NASA access list) • Experiment #2: Effect of hash table size (in K ) on look-up time (Clark. net access list) ZGP 040
Evaluation of hashing for URL routing continued Hash table look-up time for experiment #1 Mean Look-up Time 60 50 Simple 40 30 20 Aggressive 10 H 1 0 8 9 10 11 12 13 Hash table Size (K) For dense hash tables Aggressive is better than H 1 ZGP 041
Evaluation of hashing for URL routing continued Mean Look-up Time Hash table look-up time for experiment #2 40 30 Simple 20 10 Aggressive H 1 0 8 9 10 K 11 12 Similar to experiment #1 results ZGP 042 13
Evaluation of hashing for URL routing continued • Evaluation model (single server queue): Arrivals are URLs to be looked-up Server is a hash table look Queued URLs • Response variables: – mean queuing delay – drop in utilization ZGP 043
Evaluation of hashing for URL routing continued Mean queue length experiments: • Experiment #1: Effect of hash table size (K) on queue length (L) for utilization U = 80% (Simple chain) and exponential arrivals • Experiment #2: Effect of burtiness (Tmax) on L for U = 80% (Simple chain) and K = 8 • Experiment #3: Effect of (Tmax) on L for U = 80% and K = 8 • Experiment #4: Effect of autocorrelation (unshuffled and shuffled ordering of requests) on L for U = 80% and K = 8 • Experiment #5: Effect of autocorrelation (unshuffled and shuffled ordering of requests) on L for U = 80% (Simple chain) and K = 8 ZGP 044
Evaluation of hashing for URL routing continued Results for experiment #1 6 Simple 5 4 L 3 2 1 0 Aggressive 8 9 H 1 10 11 12 13 K Self-adjusting methods show similar performance ZGP 045
Evaluation of hashing for URL routing continued Results for experiment #2 40 Simple hashing value range is 30 5500 to 34000 L 20 H 1 10 Aggressive 0 50 100 250 500 750 Tmax H 1 shows faster increase in L ZGP 046 1000
Evaluation of hashing for URL routing continued Results for experiment #3 H 1 120 K L 80 K 40 K 0 Aggressive Simple 50 100 250 500 750 1000 T max H 1 has magnitudes worse queue length ZGP 047
Evaluation of hashing for URL routing continued Results for experiment #4 Algorithm Simple H 1 Aggressive unshuffled M/G/1 5. 20 3. 15 3. 13 29102. 01 8. 58 8. 57 294. 09 9. 93 9. 76 H 1 has magnitudes worse queue length ZGP 048
Evaluation of hashing for URL routing continued Results for experiment #5 Algorithm unshuffled 80. 0% 5. 20 3. 15 H 1 21. 7 0. 43 0. 36 Aggressive 12. 9 0. 18 Simple ZGP 049 U
Evaluation of hashing for URL routing continued Discussion of results • Aggressive hashing improves upon H 1 hashing – Modest look-up time improvement – Significant improvement from a queueing perspective • Queueing must be used for evaluating hashing algorithms • LRD in look-up time of H 1 results in extreme queueing delay – Catastrophic effects on any application ZGP 050
Topics • Motivation • Problem and contributions • URL routing • Improvements to URL routing • Evaluation of URL signatures • Evaluation of hashing for URL routing • Summary • List of my publications ZGP 051
Summary In summary, I have address the problem of Excessive delay in the Internet caused by the inability to efficiently access distributed content in the Web My work has shown that: 1) A URL router that uses HTTP redirection is feasible 2) CRC 32 can be used for digesting of URL routing tables 3) Aggressive hashing improves upon existing hashing algorithms in fast look-up 4) Queueing behavior needs to be considered when evaluating hashing algorithms Four publications have resulted ZGP 052
List of my related publications 1. Z. Genova and K. Christensen, "Managing Routing Tables for URL Routers in Content Distribution Networks, " submitted to the International Journal of Network Management in June 2003 2. Z. Genova and K. Christensen, “Efficient Summarization of URLs using CRC 32 for Implementing URL Switching, ” Proceedings of the 27 th IEEE Conference on Local Computer Networks (LCN), pp. 343 -344, November 2002 3. Z. Genova and K. Christensen, “Using Signatures to Improve URL Routing, ” Proceedings of IEEE International Performance, Computing, and Communications Conference, pp. 45 -52, April 2002 4. Z. Genova and K. Christensen, “Challenges in URL Switching for Implementing Globally Distributed Web Sites, ” Proceedings of the Workshop on Scalable Web Services, pp. 89 -94, August 2000 ZGP 053
- Slides: 53