Adaptive File Transfers for Diverse Environments Himabindu Pucha
Adaptive File Transfers for Diverse Environments Himabindu Pucha, Purdue University David G. Andersen, Carnegie Mellon University Michael Kaminsky, Intel Research Pittsburgh Michael Kozuch, Intel Research Pittsburgh
Goal Correctly and efficiently transfer files in wide range of scenarios 2
Goal Correctly and efficiently transfer files in wide range of scenarios • Data backup, code update • Software synchronization • Different network speeds Scenario: Software Data backup, code update Scenario: synchronization file in-place Gigabit LAN – Network DSL links Receiver Sender Search for similar files Different disk loads Network peers 3
Problem: Existing Tools Scenario-specific Tool Files in-place Other files Identical peers Peers rsync Bit. Torrent rsync-batch + Bit. Torrent dsync 4
Challenges • Resources have widely varying performance • Resource performance changes dynamically • Support receivers with different initial state • Do not require resources to be set up in advance 5
dsync: Design dsync uses all available resources effectively dsync scheduler Network Disk 6
dsync: Design • Discovers available resources using exposed backpressure information • From disk: “I’m busy writing, don’t read from me. ” • From network: “I have lots of incoming packets, don’t spend time doing IO or computation. ” • Schedules intelligently across available resources • Disk: use a pre-computed index and/or search entire disk using heuristics • Network: Schedule remaining chunks, least likely to be found on disk 7
dsync: Preliminary Results Throughput for 1 GB file on a 1 Gbps link dsync defers disk operations when network is faster than disk Bonus: dsync provides best of Bit. Torrent, rsync, scp … 8
dsync: Preliminary Results Average download time across 45 receivers, 50% similar file in-place dsync speedup: 5 x vs. rsync 2 x vs. SET dsync rapidly locates similar files and effectively combines them with peering dsync correctly uses backpressure to defer disk operations when network is faster than disk 9
BACKUP
dsync: Preliminary Results dsync correctly uses backpressure to defer disk operations when network is faster than disk 11
Goal Correctly and efficiently transfer files in wide range of scenarios • Data backup • Code update • Software synchronization • Different network speeds Scenario: Software synchronization Scenario: Data Codebackup update file in-place Gigabit LAN – Network DSL links Receiver Sender Search for similar files Different disk loads Network peers 12
Problem: Existing Tools Scenario-specific Tool disk network peers rsync (files in-place) Bit. Torrent rsync-batch + Bit. Torrent (files in-place) ~ (all receivers in identical state) 13
dsync: Design • Discovers available resources using exposed backpressure information • From disk: “I’m busy writing, don’t read from me. ” • From network: “I have lots of incoming packets, don’t spend time doing IO or computation. ” • Schedules intelligently across available resources • Disk: use a pre-computed index and/or search entire disk using heuristics • Network: Schedule remaining chunks, least likely to be found on disk 14
Challenges • Correctly use resources with widely varying performance characteristics • Dynamically adapt to changes in resource performance • Support receivers with different initial state • Do not require resources to be set up in advance 15
- Slides: 15