Smart Content Delivery in Large Networks EnRoute Caching

  • Slides: 41
Download presentation
Smart Content Delivery in Large Networks: En-Route Caching Hong Shen School of Computer Science

Smart Content Delivery in Large Networks: En-Route Caching Hong Shen School of Computer Science University of Adelaide, Australia Dept. of Computer Sci. & Tech. University of Sci. & Tech. of China 1

Outline of the Talk n Problem formulation n Unconstrained solution n Constrained solutions n

Outline of the Talk n Problem formulation n Unconstrained solution n Constrained solutions n Solution for m servers 2

Content Distribution Network n Sits between content providers and content consumers. n Contains hundreds

Content Distribution Network n Sits between content providers and content consumers. n Contains hundreds of servers throughout Internet. n Replicates and maintains customers’ content in CDN servers. 3

CDN Example: Google platform n Maintains over 450, 000 CDN servers, arranged in racks

CDN Example: Google platform n Maintains over 450, 000 CDN servers, arranged in racks located in clusters in cities around the world Allows users to access its content most rapidly by sending them lightly loaded and geographically proximate servers. n 4

Bottleneck of CDNs n Multiple transmission flows for the same object. s 1 n

Bottleneck of CDNs n Multiple transmission flows for the same object. s 1 n s 2 s 3 Solution: caching the object in selected nodes. Challenges WHEN and HOW? 5

En-Route Object Caching n Object caching: Store most commonly accessed objects close to clients

En-Route Object Caching n Object caching: Store most commonly accessed objects close to clients n En-route object caching: Objects are cached at selective nodes on the access path from client to server object Server Hold no copy Hold a copy request 6

En-Route Object Caching (cont. ) Why en-route? Important observation: • Users normally have regular

En-Route Object Caching (cont. ) Why en-route? Important observation: • Users normally have regular access patterns; • Storing object at en-route nodes during delivery does not consume extra bandwidth. 7

Caching Performance The performance of en-route object caching depends mainly on two factors: n

Caching Performance The performance of en-route object caching depends mainly on two factors: n The locations of the caches (Cache Location) n The management of the cache contents (Content Replacement) Coordinated Caching: Consider both factors when making cache decision. 8

Our Work n Web object en-route caching in tree networks ACM Transactions on Internet

Our Work n Web object en-route caching in tree networks ACM Transactions on Internet Technology, Vol. 5, No. 3, 2005, p. 480 -507. n Multimedia object en-route caching in tree networks ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 1, No. 3, 2005, p. 289 -314. n Multimedia object placement for transparent data replication in linear array IEEE Transactions on Parallel & Distributed Systems, Vol. 18 , No. 2, 2007, p. 212 -224. n Multiserver en-route web caching IEEE Transactions on Computers (under review), 2007. 9

Definitions and Notations G=(V, E) is a graph, where V is the set of

Definitions and Notations G=(V, E) is a graph, where V is the set of nodes and E is the set of links. n Cost saving s(v): the cost saving of storing a new object in node (cache) v. n Cost loss l(v): the cost loss of removing other objects from node v in order to accommodate the new object. n Cost gain g(v): g(v)=s(v) –l(v). n 10

Problem Formulation Find a node set P to store the object s. t. the

Problem Formulation Find a node set P to store the object s. t. the total cost gain is maximized: G(P)= 11

Problem Formulation for Tree Networks G D(w) w Aw Server Hold no copy v’

Problem Formulation for Tree Networks G D(w) w Aw Server Hold no copy v’ f(v) f’(v) v Hold a copy 12

Constraints The different cases of C include: C is null (unconstrained). n The cost

Constraints The different cases of C include: C is null (unconstrained). n The cost gain for each node is greater than zero, i. e. , g(v)>0 for all v in P. n The number of copies is exactly k, i. e. , |Aw|=k. n The number of copies is no more than k, i. e. |Aw| k. n 13

Solution for Unconstrained Case Main idea: Decompose the tree level by level recursively to

Solution for Unconstrained Case Main idea: Decompose the tree level by level recursively to a set of lines or singletons (nodes) whose solutions are known. Solution (Aw) to tree Tw is obtained by combining (union of) the solutions (Aw, x) to Tw’s subtrees. 14

Tree Decomposition (1) C(w): set of all children of node w. 15

Tree Decomposition (1) C(w): set of all children of node w. 15

* A Decomposition of w A*w w w 1 w 2 A*w, w 1

* A Decomposition of w A*w w w 1 w 2 A*w, w 1 w A*w, w 2 w w 2 16

Tree Decomposition (2) 17

Tree Decomposition (2) 17

* Decomposition of Aw, x Theo. 2 Aw* , x 1 w * Aw,

* Decomposition of Aw, x Theo. 2 Aw* , x 1 w * Aw, , , x w w x 1 x 2 x 1 * 2 * Aw, , , x ) 1. G( T w, , , x , Aw, , , x 1 2 A x x 2 w x x 1 * * w , x 2 G(T w, , , x , Ax {x} ) * * x 2 * x A 2. otherwise Theo. 1 18

Algorithm 1 19

Algorithm 1 19

Algorithm 1: Continued 20

Algorithm 1: Continued 20

Time Complexity The algorithm runs in time: tw= O( v C(w) ( C(v) +tv)

Time Complexity The algorithm runs in time: tw= O( v C(w) ( C(v) +tv) ) = O( v V D(v) ) = O(n 2), where n is the total number of nodes in the network. 21

Solution for Constrained Case I Non-negative cost gain per node (1) 22

Solution for Constrained Case I Non-negative cost gain per node (1) 22

Transformation The optimal solution for Problem (1) is equivalent to (2) 23

Transformation The optimal solution for Problem (1) is equivalent to (2) 23

Algorithm 2 24

Algorithm 2 24

Algorithm 2 (Continued) Time Complexity: O(n 2) 25

Algorithm 2 (Continued) Time Complexity: O(n 2) 25

Solution for Constrained Case II Placing exactly k copies (3) 26

Solution for Constrained Case II Placing exactly k copies (3) 26

Algorithm 3 Time Complexity: O(n 2 log(fn)), where f=max{f(v)}. 27

Algorithm 3 Time Complexity: O(n 2 log(fn)), where f=max{f(v)}. 27

Solution for Constrained Case III Placing at most k copies (4) 28

Solution for Constrained Case III Placing at most k copies (4) 28

Algorithm 4 Time Complexity: O(kn 2 log(fn)), where f=max{f(v)}. 29

Algorithm 4 Time Complexity: O(kn 2 log(fn)), where f=max{f(v)}. 29

Extension to ASes System Model: 30

Extension to ASes System Model: 30

Solution n Dividing the whole system into two parts and one part is a

Solution n Dividing the whole system into two parts and one part is a tree. n Continuing to divide the other part in the same way until there is only one tree left. n Applying the methods for tree network. 31

More General Setting: m-Sever En-route Caching A set of servers S={sj, 1≤ j ≤m}

More General Setting: m-Sever En-route Caching A set of servers S={sj, 1≤ j ≤m} located at leaves of a tree. Cost saving for node w, s(w, dj), under the condition that the distances from w to the nearest high level node towards server sj that holds a copy is dj. Find a node set P to store the object, s. t. the total gain is maximized (v P serves nodes g(v, S)) 32

The Challenge We can’t get optimal solution to multi-server problem by simply combining solutions

The Challenge We can’t get optimal solution to multi-server problem by simply combining solutions to 1 -server problem. + A Simple 2 -Server Problem Solve 1 -server problem Optimal Solution ≠ Hold a copy No copy 33

A More General Definition n Condition Dw, Dw=[d 1, …dj, …dm], dj is the

A More General Definition n Condition Dw, Dw=[d 1, …dj, …dm], dj is the distance from node w to the nearest node towards sj, for example u, that hold a copy of object O. n G(w, Dw), is the objective value of (6) in Tw under condition Dw, n A(w, Dw) is the solution corresponding to G(w, Dw). 34

Lemma 1 For tree Tr containing m servers at leave nodes, the distances from

Lemma 1 For tree Tr containing m servers at leave nodes, the distances from wi to the nearest node towards sj that holds a copy are denoted by e(wi, dj) and k(wi, dj) for the cases that node wi holds a copy and no copy respectively, then we have r s 1 wi ∈ path[r, sj] means server sj is in the subtree twi, because servers are located at leaves. s 2 s 3 An example of multi-server network 35

Theorem 3 For tree Tr containing m servers at leave nodes, the optimal solution

Theorem 3 For tree Tr containing m servers at leave nodes, the optimal solution of (6) is A(r, Dr) and corresponding objective value is G(r, Dr), where Dr is the vector of distances from root node to servers and 36

Theorem 3 (cont. ) 37

Theorem 3 (cont. ) 37

The Algorithm n Main idea: Problem is split top-down and solution A(r, Dr) is

The Algorithm n Main idea: Problem is split top-down and solution A(r, Dr) is generated bottom-up according to Theorem 3, with corresponding objective value G(r, Dr). n Time complexity: Algorithm computes all G(w, Dw), where w∈ V, Dw = [d 1, …dj, …dm], 0 ≤ dj ≤ hw, hw is the distance from w to sj, hw ≤ 2 h. Time complexity of the algorithm is O(nhm). 38

Conclusion New tree decomposition techniques for en-route web caching. n Polynomial-time algorithms for the

Conclusion New tree decomposition techniques for en-route web caching. n Polynomial-time algorithms for the first time for 1 -server en-route web-caching in tree networks. n n p-server en-route web caching in tree networks: 39 m O(nh ) time.

Question s? 40

Question s? 40

Calculating cost loss l(v) Cost loss l(v): The additional cost caused by removing some

Calculating cost loss l(v) Cost loss l(v): The additional cost caused by removing some objects from v to make room for the new object: Missing penalty m(v): The additional cost of accessing the object if it is not cached at v. E. g. m(3)=c(3, 0), f’(3)=0 m(7)=c(7, 4). f’(4)=f(6) f’(5)=f(8) +f(9) Holding no copy Server 0 Holding a copy 1 c(3, 0) 3 6 2 5 4 c(9, 5) 7 8 41 9