Fast Replica Efficient Large File Distribution within Content

  • Slides: 34
Download presentation
Fast. Replica: Efficient Large File Distribution within Content Delivery Networks Lucy Cherkasova HPLabs, Palo

Fast. Replica: Efficient Large File Distribution within Content Delivery Networks Lucy Cherkasova HPLabs, Palo Alto and Fast. Replica Jangwon Lee UT Austin 1

What Is the Problem? o Content Delivery Networks (CDNs): H H large-scale distributed network

What Is the Problem? o Content Delivery Networks (CDNs): H H large-scale distributed network of servers, servers are located closer to the edges of the Internet. o Main goal of CDN’s architecture is H H minimize the network impact in the content delivery path, overcome server overload problem for popular sites. o Content distribution within CDNs, i. e. to the edge servers H H pull model: performance penalty is insignificant for small/medium documents; push model: active replication of the original content is desirable for large documents such as software download packages, media files, etc. o Replicating a large file to a large set of edge servers is a challenging and resource intensive task!!! Fast. Replica 2

Content Distribution in the Internet Environment o Satellite distribution H H H content distribution

Content Distribution in the Internet Environment o Satellite distribution H H H content distribution server (or original site) has a transmitting antenna, replica-servers (edge servers) have a satellite receiving dish, content distribution server broadcasts a file via satellite channel, • requires special hardware, expensive. o Multicast distribution H requires a multicast support in routers, • not widely available across the Internet infrastructure. o Application-level multicast distribution H H nodes act as intermediate routers to distribute a content along predefined mesh or tree • performance is limited by the bottleneck link in the path, informed content delivery across adaptive overlay networks (SIGCOMM, 2002) Fast. Replica 3

What Do We Propose? Fast. Replica Presentation Outline: o Fast. Replica in the small

What Do We Propose? Fast. Replica Presentation Outline: o Fast. Replica in the small (algorithm core, applicable to 10 -30 nodes) H Preliminary performance analysis of Fast. Replica in the small o Fast. Replica in the large (scaling algorithm core to thousands of nodes) o Reliable Fast. Replica Algorithm o Performance evaluation of Fast. Replica prototype in a widearea testbed Fast. Replica 4

Fast. Replica in the Small o Problem Statement: H H Let N 0 be

Fast. Replica in the Small o Problem Statement: H H Let N 0 be a node which has an original file F and let Size(F) denote the size of file F in bytes; Let R = {N 1, … , Nn} be a replication set of nodes. The problem consists in replicating file F across nodes N 1, … , Nn while minimizing the overall replication time. o Let set N 1, … , Nn be in a range 10 -30 nodes. o File F is divides in n equal subsequent files: F 1 , … , Fn where Size(Fi) = Size(F) / n bytes for each i = 1, … , n. o Fast. Replica in the small consists of two steps: H H Distribution Step , Collection Step. Fast. Replica 5

Fast. Replica in the Small: Distribution Step N 3 N 2 F 2 N

Fast. Replica in the Small: Distribution Step N 3 N 2 F 2 N 1 N n-1 F 3 F n-1 F 1 Fn Nn File F N 0 F 1 F 2 F 3 F n-1 F n Origin node N 0 opens n concurrent connections to nodes N 1, … , Nn and sends to each node the following items: • a distribution list of nodes R = {N 1, … , Nn} to which subfile Fi has to be sent on the next step; • subfile Fi. Fast. Replica 6

Fast. Replica in the Small Collection Step (View “from a Node”) N 3 F

Fast. Replica in the Small Collection Step (View “from a Node”) N 3 F 2 N 2 F 1 F 3 N n-1 F 1 F 1 N 1 Fn Nn File F N 0 F 1 F 2 F 3 F n-1 F n After receiving Fi , node Ni opens (n-1) concurrent network connections to remaining nodes in the group and sends subfile Fi to them. Fast. Replica 7

Fast. Replica in the Small Collection Step (View “to a Node”) N 3 F

Fast. Replica in the Small Collection Step (View “to a Node”) N 3 F 2 N n-1 F 3 F 2 F 1 F 3 F n-1 N 1 Fn Nn Fn File F N 0 F F F 1 2 3 n-1 n Thus each node Ni has: • (n - 1) outgoing connections for sending subfile Fi , • (n - 1) incoming connections from the remaining nodes in the group for sending complementary subfiles F 1, … , Fi-1 , Fi+1 , … , Fn. Fast. Replica 8

What Is the Main Idea of Fast. Replica? Instead of typical replication of the

What Is the Main Idea of Fast. Replica? Instead of typical replication of the entire file F to n nodes using n Internet paths Fast. Replica exploits (n x n) different Internet paths within the replication group, where each path is used for transferring 1/n-th of file F. Benefits: H The impact of congestion along the involved paths is limited for a transfer of 1/n-th of the file, H Fast. Replica takes advantage of the upload and download bandwidth of recipient nodes. Fast. Replica 9

Preliminary performance analysis of Fast. Replica in the small o Two performance metrics: average

Preliminary performance analysis of Fast. Replica in the small o Two performance metrics: average and maximum replication time. o Idealistic setting: all the nodes and links are homogeneous, and let each node can support n network connections to other nodes at B bytes/sec. Timedistr = Size(F) / (nx. B) Timecollect = Size(F) / (nx. B) Fast. Replica: Time. FR =Timedistr + Timecollect = 2 x Size(F) / (nx. B) Multiple Unicast: Time. MU = Size(F) / B Replication_Time_Speedup = Time. MU / Time. FR = n / 2 Fast. Replica 10

Uniform-Random Model Let BW denote bandwidth matrix, where BW [i][j] reflects available bandwidth of

Uniform-Random Model Let BW denote bandwidth matrix, where BW [i][j] reflects available bandwidth of the path from Ni to Nj. Let BW [i][j] = B x random(1, Var), where Var is a bandwidth variance. Fast. Replica 11

Maximum Latency Speedup under Uniform-Random Model N 3 F 2 N 2 F 3

Maximum Latency Speedup under Uniform-Random Model N 3 F 2 N 2 F 3 N n-1 F 1 Nn N 0 File F Worst path transferring the entire file F against worst path with two segments transferring 1/n-th of file F F 1 F 2 F 3 leads to n/2 in maximum latency improvement. Fast. Replica Fn F n-1 F n 12

Example with Skewed Path Bandwidth of Paths 0. 1 B N 3 F 2

Example with Skewed Path Bandwidth of Paths 0. 1 B N 3 F 2 N 2 B F 3 N 9 F 9 0. 1 B F 1 0. 1 B N 1 B B B 0. 1 B N 10 B At a first glance, the cross-nodes connections have significantly worse available bandwidth. Question: What is Fast. Replica performance in this configuration? Fast. Replica N 0 F 10 File F F 1 F 2 F 3 F 9 13 F 10

Fast. Replica Performance for “Skewed” Example While the average replication time is almost the

Fast. Replica Performance for “Skewed” Example While the average replication time is almost the same under Fastreplica and Multiple Multicast, the maximum replication time under Fastreplica provides 5 times performance benefits! Fast. Replica 14

Modified Example Bandwidth of Paths N 3 F 2 N 2 0. 1 B

Modified Example Bandwidth of Paths N 3 F 2 N 2 0. 1 B B F 3 N 9 0. 1 B F 1 0. 1 B B B N 1 B B Let all the connections from origin node to recipient nodes are B, while all the cross-nodes connections have available bandwidth of 0. 1 B. Question: What is performance of Fast. Replica in this configuration? Fast. Replica N 10 B N 0 F 10 File F F 1 F 2 F 3 F 9 15 F 10

Fast. Replica Performance for Modified “Skewed” Example In this configuration, Fast. Replica does not

Fast. Replica Performance for Modified “Skewed” Example In this configuration, Fast. Replica does not provide any performance benefits compared to Multiple Multicast. Number n of nodes in Fast. Replica in the small plays an important role here: a larger value of n provides a higher “safety” level for Fast. Replica performance. A larger value of n helps to offset a higher difference in bandwidth between • the available bandwidth from the origin node to the nodes in the replication group, and • the available bandwidth within the replication group. Fast. Replica 16

Fast. Replica in the Large Scaling Process: • All the nodes are partitioned in

Fast. Replica in the Large Scaling Process: • All the nodes are partitioned in groups of k nodes, where k is a number of network connections chosen for concurrent transfers between a single node and multiple receiving nodes. • Once a group of nodes receives the entire file F, they act as origin nodes and replicate file F to the next set of nodes. Example. Let k =10. In 3 iterations (each taking 2 steps: distribution and collection), the original file can be replicated to 1000 nodes (10 x 10). Fast. Replica 17

Reliable Fast. Replica o The basic algorithm is sensitive to node failures: H H

Reliable Fast. Replica o The basic algorithm is sensitive to node failures: H H if node N 1 fails during either distribution or collection step then this event may impact all the nodes N 2 , … , Nn in the group because each node depends on node N 1 to receive subfile F 1. if node N 1 fails when it acts as an origin node, this failure impact all of the replication groups in the dependent replication subtree. o Goal: to design an algorithm which efficiently deals with node failures by making local repair decision within the particular group of nodes. Fast. Replica 18

Reliable Fast. Replica Heartbeat Group: origin and recipient nodes: the recipient nodes send heartbeat

Reliable Fast. Replica Heartbeat Group: origin and recipient nodes: the recipient nodes send heartbeat messages to the origin node: “I’m alive. I perform a distribution (or collection) step to nodes {Ni 1, …. , Nij} in group G/ “. Different failure modes of a node: • node acts as an origin node; • node acts as a recipient node performing distribution/collection step. ^ N If node N /0 fails while acting as origin node for replication group G / then G / should be “reattached” to a higher-level origin node N^0 and N^0 acts as a replacement node for N /0 0 G / N 0 / G / N 1 Ni / … … // Fast. Replica / Nk G 19

Reliable Fast. Replica (cont. ) • If N/i fails while acting as a recipient

Reliable Fast. Replica (cont. ) • If N/i fails while acting as a recipient node either during collection (or distribution) step then N/0 performs the following repair step: / Ni N 2/ Fi F 2 / N 1 / N k-1 F 1 Fk Fi Fi Fi / Fi N k/ File F N 0 F 1 F 2 F 3 Fast. Replica F n-1 F n 20

Reliable Fast. Replica (cont. ) o Proposed algorithm handles a single node failure within

Reliable Fast. Replica (cont. ) o Proposed algorithm handles a single node failure within a group with minimal performance penalty. o The number of heart-beat messages in such a group is very small (because only the recipient nodes send the heart-beat messages to their origin node). This structure significantly simplifies the protocol. Fast. Replica 21

Performance Evaluation of Fast. Replica Prototype in a Wide-Area Testbed Thanks to our summer

Performance Evaluation of Fast. Replica Prototype in a Wide-Area Testbed Thanks to our summer interns, we built a wide-area testbed of 9 nodes and used it for performance evaluation of Fast. Replica prototype. Fast. Replica 22

Experimental Wide-Area Testbed Geographic location of hosts: N 7 N 2 N 6 N

Experimental Wide-Area Testbed Geographic location of hosts: N 7 N 2 N 6 N 8 N 0 N 4 N 5 N 1 Fast. Replica N 3 23

Goals of Performance Study o We compare the following distribution schemes: H H H

Goals of Performance Study o We compare the following distribution schemes: H H H Fast. Replica in the small Sequential Unicast -- approximates distribution via IP multicast, measures transfer time of entire file from the source to each recipient independently; Multiple Unicast -- simultaneously transfers the entire file to all the recipient nodes by using concurrent connections. o We evaluate two metrics: H H average replication time maximum replication time o We experimented with 9 different size files: 80 KB, 750 KB, 1. 5 MB, 3 MB, 4. 5 MB, 6 MB, 7. 5 MB, 9 MB, 36 MB. o Each point in the results averages 10 different runs which were performed over 10 day period. Fast. Replica 24

Average Replication Time n paths transferring the entire file vs (n x n) paths

Average Replication Time n paths transferring the entire file vs (n x n) paths transferring only 1/n-th of the file Congestion on any of the n paths from origin node to recipient nodes impact both Multiple Unicast and Sequential Multicast. Fast. Replica uses any of those paths for transferring only 1/n-th of the file. Fast. Replica significantly outperforms Multiple Unicast and, in most cases , outperforms Sequential Multicast. Fast. Replica 25

Maximum Replication Time Fast. Replica significantly outperforms both Multiple Unicast and Sequential Multicast. Maximum

Maximum Replication Time Fast. Replica significantly outperforms both Multiple Unicast and Sequential Multicast. Maximum replication time under Multiple Unicast and Sequential Multicast is much higher than corresponding average replication time. Fast. Replica 26

Fast. Replica: Average and Maximum Replication Times Maximum and average replication time under Fast.

Fast. Replica: Average and Maximum Replication Times Maximum and average replication time under Fast. Replica are very close. These results demonstrate the robustness and predictability of performance results under new strategy. Fast. Replica 27

Fast. Replica Performance (cont. ) Figure shows the average replication time measured by different,

Fast. Replica Performance (cont. ) Figure shows the average replication time measured by different, individual recipient nodes for a 9 MB file and 8 nodes in replication set. High variability of replication time under Multiple and Sequential Multicast. File replication time under Fast. Replica across different nodes in replication set are much more stable and predictable. Fast. Replica 28

Average and Maximum Time Speedup under Fast. Replica significantly outperforms Multiple Unicast. For configuration

Average and Maximum Time Speedup under Fast. Replica significantly outperforms Multiple Unicast. For configuration of 8 nodes, performance benefits are 4 (aver) - 13 (max) times for a 1. 5 MB file, 3. 5 (aver) - 5 (max) times for a 9 MB file, 4 (aver) - 6. 5 (max) times for a 36 MB file Fast. Replica 29

File Size Sensitivity Analysis The files of 80 KB and 750 KB are the

File Size Sensitivity Analysis The files of 80 KB and 750 KB are the smallest ones used in our experiments. For a 80 KB, Fast. Replica is not efficient, while for 750 KB, it becomes efficient. (These results are dependent on the number of nodes in the replication set!!!). Fast. Replica 30

Experiments with Different Configuration o The additional analysis revealed that the available bandwidth of

Experiments with Different Configuration o The additional analysis revealed that the available bandwidth of the paths between the origin node N 0 (hp. com) and nodes N 1, N 2 , … , N 7 (universities’ machines) is significantly lower than the cross bandwidth between nodes N 1 , N 2 , … , N 7. Node N 8 had also a very limited incoming bandwidth from N 0, N 1 , … , N 7. The outgoing bandwidth from N 8 to N 0, N 1 , … , N 7 was significantly higher. o Different configuration: let N 1 (utexas. edu) be the origin node. o What is Fast. Replica performance in a new configuration? Fast. Replica 31

Fast. Replica Speedup in a New Configuration In the new configuration, the average replication

Fast. Replica Speedup in a New Configuration In the new configuration, the average replication times under Fast. Replica and Multiple Unicast are similar, but the maximum speedup under Fast. Replica is significantly better than under Multiple Unicast. Fast. Replica 32

Conclusion and Future Directions o In this work, we introduce Fast. Replica for efficient

Conclusion and Future Directions o In this work, we introduce Fast. Replica for efficient and reliable replication of large files in the Internet environment o Fast. Replica is simple and inexpensive. It does not require any changes or modification to the existing Internet infrastructure, and significantly reduces the file replication time. o Interesting future directions are H how to better cluster nodes in the replication groups? H how to build an efficient overlay tree on top of those groups? H designing ALM-Fast. Replica via combination of Fast. Replica’s ideas with ALM (Application Level Multicast). Fast. Replica 33

Acknowledgements We would like to thank: o HPLabs summer interns who helped us to

Acknowledgements We would like to thank: o HPLabs summer interns who helped us to build wide-area testbed: Yun Fu, Weidong Cui, Taehyun Kim, Kevin Fu, Zhiheng Wang, Shiva Chetan, Xiaoping Wei, and Jehan Wickramasuriya; o John Apostolopoulos for motivating discussions; o John Sontag for his active support of this work; o our shepherd Srinivasan Seshan and the anonymous referees for useful remarks and insightful questions. Their help is highly appreciated ! Fast. Replica 34