Large Scale File Distribution Sequential Branching Distribution Final
Large Scale File Distribution Sequential Branching Distribution Final Presentation Grad Operating Systems Presented by Chris Miller & Pramita Mitra Dec 13, 2006
Problem Statement ● Research requires distribution of large datasets on distributed networks ● Methods such as multicast are too complicated to implement reliably ● Tools available for file distribution ● Chirp Parrot Algorithm needed to efficiently schedule the distribution of files
Solution ● ● Using CCL storage pool as model of distributed network Using small, measured steps to find what aspects of distribution work best in implementation Sequential distribution Distributor Ineffiecient use of network resources. Stage 1 Stage 2 … Stage n Total time for distribution O(n). Parallel distribution Distributor Node 1 Total time for distribution O(n). Node 2 … Node n
Baseline Results
Sequential Branching Distribution Node set Distributor Thirdput Stage 2 Stage 3 Thirdput Stage 1 Stage 2 Stage 3 Total time for distribution O(log 2 n) Stage 3
Best Neighbor Approximation
Best Neighbor Approximation
Probabilistic Weighted Average
Best Neighbor Approximation Data File 100 MB 250 MB 500 MB 1 GB Reduction in Net Net Transfer Time Over. Head Reductio n 1 MB 16. 64% 39. 93% -23. 3% 15. 97% 0. 7% 7. 99% 8. 7% 3. 99% 12. 6% 2 MB 28. 44% 44. 32% -15. 9% 17. 73% 10. 7% 8. 86% 19. 6% 4. 43% 24. 0% 3 MB 29. 12% 50. 16% -21. 0% 20. 07% 9. 1% 10. 03% 19. 1% 5. 02% 24. 1% 4 MB 23. 20% 55. 61% -32. 4% 22. 24% 1. 0% 11. 12% 12. 1% 5. 56% 17. 6% 5 MB 27. 39% 67. 07% -39. 7% 26. 83% 0. 6% 13. 41% 14. 0% 6. 71% 20. 7% Latency 16. 59% 15. 55% 1. 0% 6. 22% 10. 4% 3. 11% 13. 5% 1. 56% 15. 0% File Size
Results
Conclusions ● ● ● A fast and reliable distribution method is possible with simple file transfer methods Distribution system is fault tolerant for all nodes except distributor node Latency measurement moderate indicator of transfer rate low overhead Small file transfer approximation strong indicator of transfer rate high overhead Performance is near O(log 2 n)
- Slides: 11