Caching and Content Distribution Networks Web Caching r
Caching and Content Distribution Networks
Web Caching r As an example, we use the web to illustrate caching and other related issues request browser response browser request Web Proxy Web server request cache response Web server
Web Browser Caching r Web browsers have their own caches. When a page is downloaded from a site the web page is put into the browser cache. r This is especially useful in those cases when the back button is pressed. r If a new copy is needed then a “refresh” can be done. r No page stays permanently in the cache. There is limited room. m A replacement algorithm is needed to determine which cached page should be purged.
Why Web Server Caching r Latency m Reduce latency m Request does not require going to the server m Request is served from the client side which means that network communication is avoided r Reduce traffic
Consistency r What if the page changes after saved in the cache? This means that cached copy is out of date m The copy and the original are not consistent m r There are different strategies for dealing with this
Web Browser Caching r r u Client pull m The server provides the content with instructions on when the client should ask for a refreshed copy of the content or if the content should be cached. Server push m The server transmits page information to the screen. m The browser application displays the information and leaves the connection to the server open. m With an open connection, the server can continue to push updated pages for your screen to display on an ongoing basis. You can close the connection by closing the page. m The server is in control Browser caches are different from proxy caches (discussed next).
Web Caching r Proxy caches (also called proxy server) m Intercepts HTTP requests from client • Serves object if in its cache and the date is still valid • If not go to object’s home server – On behalf of user, gets the object and possibly deposits in its cache before returning to user • Usually deployed at edges of a network – Wide area bandwidth savings, improved response time and increased availability of static web-based objects r A browser may have to be configured to point to the proxy server. r Usually a proxy cache is purchased and installed by an organization
Web Caching r Not all web pages can be cached m If the Last-Modified tag then page can be cached r Refresh is often done when m There is a request; and m Expiry time has passed
Cooperative Caching r Caching infrastructure can have multiple web proxies Proxies can be arranged in a hierarchy or other structures m Proxies can cooperate with one another m • Answer client requests • Propagate server notifications m Uses a combination of HTTP and ICP (Internet Caching Protocol). • ICP can be used by one cache to quickly ask another cache if it has an object. • HTTP is used to actually retrieve the object.
Problems r Caching proxies do not serve all Internet users r Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies. r Accounting issues with caching proxies: m Example: www. cnn. com needs to know the number of hits to the advertisements displayed on the web page.
Content Distribution Networks (CDN) r Business Model: A content provider such as www. cnn. com or Yahoo pays a CDN company (such as Akamai) to get its content to the requesting users with short delays. r A CDN provides a mechanism for m Replicating content on multiple servers in the Internet m Providing clients with a means to determine the servers that can deliver the content fastest.
Terminology r Content: Any publicly accessible combination of text, images, applets, frames, MP 3, video, flash, virtual reality objects, etc. r Content Provider: Any individual, organization, or company that has content that it wishes to make available to users. r Origin Server: Content provider’s server , where the content is first uploaded. r Surrogate Server (sometimes called edge server): Content distributor’s server, where the replicated content is kept.
Players Yahoo, MSNBC, Content Provider CNN CBC Sen d co nten t Content Distributor rve rs H/W and S/W Vendor l se S se tal s ell r rve Ins s Cisco, Oracle. Sun Akamai, Hosting Provider Bell
14 CDN Distribution r Content providers are CDN customers origin server in North America Content replication r CDN company installs thousands of servers throughout Internet m In large datacenters m Or, close to users r CDN replicates customers’ content r When provider updates content, CDN updates servers CDN distribution node CDN server in S. America CDN server in Asia in Europe
CDN: Functional Components Distribution Service r Redirection Service r Accounting and Billing system r
CDN: Distribution Service r The content provider determines which of its objects it wants the CDN to distribute. r The content provider tags and then pushes this content to a CDN node, which in turn replicates and pushes the content to all its CDN servers.
CDN: Redirection r When a browser in a user’s host is instructed to retrieve a specific object (specified using a URL), how does the browser determine whether it should retrieve the object from the origin server or from one of the CDN servers? r As an example, suppose the hostname of the content provider is www. cnn. com
18 How Akamai Works cnn. com (content provider) DNS root server GET index. html 1 http: //a. 73. g. akamai. net/7/23/ 2 cnn. com/af/cnn. com/foo. jpg HTTP End-user Akamai global DNS server Akamai cluster Akamai regional DNS server Nearby Akamai cluster
CDN: Redirection r Users get an html document from www. cnn. com; this could be index. html r The file index. html uses a modified URL for content that has been replicated. r Example: If the jpeg files are what has been replicated then <img src=“http: //cnn. com/af/foo. jpg> may be modified as follows: <img src=http: //a 73. g. akamai. net/7/23/cnn. com/af/foo. j pg> r The browser needs to resolve a 73. g. akamai. net
CDN: Redirection r What does this mean? <img src=http: //a 73. g. akamai. net/7/23/cnn. com/af/foo. j pg> m m m host part: a 73. g. akamai. net Akamai control part: /7/23 Content URL: /af/foo. jpg
CDN: Redirection r DNS is configured so that all queries about g. akamai. net that arrive at a DNS server are sent to an authoritative DNS server for g. akamai. net. r This is referred to as a Akamai DNS server (authoritative DNS server)
How Akamai Works cnn. com (content provider) DNS root server DNS lookup cache. cnn. com 1 2 Akamai global DNS server 3 4 ALIAS: g. akamai. net End-user Akamai cluster Akamai regional DNS server Nearby Akamai cluster
CDN: Redirection r DNS is configured so that all queries about g. akamai. net that arrive at a DNS server are sent to an authoritative DNS server for g. akamai. net. This is referred to as a Akamai DNS server (authoritative DNS server) r When the Akamai DNS server receives the query, . it extracts the IP address of the requesting browser.
How Akamai Works cnn. com (content provider) DNS root server DNS lookup g. akamai. net P 1 2 5 3 4 6 ALIAS a 73. g. akamai. net End-user Akamai global DNS server Akamai cluster Akamai regional DNS server Nearby Akamai cluster
CDN: Redirection r Based on the IP address and information that it has about the Internet (called a map), the IP address of an Akamai regional server is returned to the requesting browser based on policy m e. g. , select the server that is the fewest hops away. r The regional server may choose a surrogate server for content retrieval
How Akamai Works cnn. com (content provider) DNS root server 1 HTTP 2 5 3 6 4 t i. ne a m a k 3. g. a 7 a DNS Address 1. 2. 3. 4 End-user Akamai global DNS server 7 Akamai cluster Akamai regional DNS server 8 Nearby Akamai cluster
How Akamai Works cnn. com (content provider) DNS root server 1 HTTP 2 Akamai global DNS server 5 3 6 4 7 Akamai cluster Akamai regional DNS server 8 9 End-user GET /foo. jpg Host: cache. cnn. com Nearby Akamai cluster
How Akamai Works cnn. com (content provider) DNS root server GET foo. jpg 11 12 1 HTTP 2 Akamai global DNS server 5 3 6 4 7 Akamai cluster Akamai regional DNS server 8 9 End-user GET /foo. jpg Host: cache. cnn. com Nearby Akamai cluster
CDN Redirection r The Akamai DNS server IP address is now in the cache of the local DNS server. m This implies that it is not always necessary to go to the root DNS server. r The TTL associated with the IP address of an Akamai server(surrogate) is relatively small. m This is done for performance reasons. r Akamai content distribution servers are caches
CDN Redirection r What if content is not there? m If the request content is not found then the surrogate will ask other surrogates within a specified region for information. m If requested information is still not found or is stale, then a request is made to the original web site.
CDN Selection r The tricky issue is selecting which local content server to use for a particular request Want to spread load evenly m Want minimal impact if server is added or removed. m r In Akamai, each surrogate server sends measurement results to the Network Operations Communications Center (NOCC). Measurement results include number of active TCP connections, HTTP request arrival rate, bandwidth availability, etc m This information is used by the Akamai DNS server. m
Accounting Mechanism r Accounting mechanisms collect and track information related to request routing, distribution and delivery. r Information is gathered in real time and put into log files for each CDN component. r This gets sent to the Network Operations Communications Center (NOCC).
Full Site Delivery vs. Partial Site Delivery r Full Site Delivery : All the contents are delivered by the CDN (including HTML, images, and other objects). r Partial Site delivery: Only images, streaming media and other bandwidth intensive objects delivered by the CDN.
Current Akamai Customers
Summary r We have examined replication and issues related to the design and implementation of a replicated system. r Many choices and tradeoffs to consider
- Slides: 35