UtilityBased Scheduling for Bulk Data Transfers between Distributed
Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu, Zhiling Lan Author email: xwang 149@hawk. iit. edu
Content • Motivation • Problem statement • Utility optimization model • Experiments • Conclusions • Future work Author email: xwang 149@hawk. iit. edu
Motivation • Data movement over wide-area network is increasingly needed • WAN bandwidths grow at a slower pace – shared and limited resources • Lacking coordination for multi-user bulk data transfers causes problems – Network contentions – Bandwidth waste Author email: xwang 149@hawk. iit. edu
Background - Grid. FTP • In Grid. FTP transfers, two key performance-tuning mechanisms include: – Parallelism – Concurrency R. Kettimuthu, G. Vardoyan, G. Agrawal, P. Sadayappan, “Modeling and optimizing large-scale wide-area data transfers, ” in Proc. of CCgrid’ 14, 2014. Author email: xwang 149@hawk. iit. edu
Problem Context • A scheduler inside the source host will make following decisions at each scheduling iteration: – when to start which jobs – how many TCP connections to assign to each job Author email: xwang 149@hawk. iit. edu
Goals • A scheduler coordinates data transfer requests (jobs) from one source host to multiple destination hosts – queue prioritization (temporal) – connection allocation (spatial) • Improve system performance and user satisfaction – minimizing job turnaround time – maximizing aggregate job utility Author email: xwang 149@hawk. iit. edu
Background – Utility Function User satisfaction can be measured by utility function Utility function is a function of job turnaround time and can be used to represent the value (utility) that the user attaches to the job completion. Maximizing aggregate job utility is consistent with an enhanced overall user satisfaction regarding job turnaround. Author email: xwang 149@hawk. iit. edu
Base Methods • FCFS: First-come, first-served – scheduler will pick the job with oldest submission time in order to maintain fairness regarding job arrival order. • SJF: Short (small) job first – jobs are sorted based on the ratio of waiting time and job size. – larger (longer) jobs can tolerate longer waiting time than smaller (shorter) jobs. Author email: xwang 149@hawk. iit. edu
Utility-based Data Transfer Scheduling Author email: xwang 149@hawk. iit. edu
Algorithm Description • Step 1: Bandwidth Allocation – Allocate the shared bandwidth for different destination – Use max-min fairness approach – Assign the number of TCP connections to different destinations to resolve bandwidth allocation The lowest demand is maximized; only after the lowest demand on the network resource has been satisfied will the second-lowest demand be maximized; and so on. Author email: xwang 149@hawk. iit. edu
Algorithm description • Step 2: Job Prioritizing and Selection – Base job prioritizing methods: • First-come, first-serve (FCFS) • Short-job-first (SJF) – Use sliding window to select the first W (window size) jobs in the head of queue as candidates for scheduling Author email: xwang 149@hawk. iit. edu
Algorithm description Step 3: Utility-Based Connection Allocation - Assign each candidate job certain TCP connections to achieve maximum aggregate utility by solving equations: Uj (Tw+Te) defines the estimated utility of assigning Cj TCP connections to job j, where: Author email: xwang 149@hawk. iit. edu
Greedy algorithm for connection allocation • Initially evenly distribute the total available connections to each job Author email: xwang 149@hawk. iit. edu
Greedy algorithm for connection allocation -1 +4 • Initially evenly distribute the total available connections to each job • Conduct connection exchange repeatedly, at which reduce one connection from a job and add it to another job to increase aggregate utility Author email: xwang 149@hawk. iit. edu
Greedy algorithm for connection allocation • Initially evenly distribute the total available connections to each job • Conduct connection exchange repeatedly, at which reduce one connection from a job and add it to another job to increase aggregate utility • Stop when the aggregate utility cannot be increased any more Author email: xwang 149@hawk. iit. edu
Dsim: Simulation Framework https: //github. com/xwang 149/Dsim Author email: xwang 149@hawk. iit. edu
Experiment Setup • Simulating 2 hour real job traces from data transfer node(DTN) at Stampede to three different destination DTNs. • Categorize jobs into small jobs(≤ 1 G) and large jobs(>1 G) • Conduct experiments with different numbers of maximum TCP connections varying from 10 to 50. • Define equal utility function for each job within category Author email: xwang 149@hawk. iit. edu
Experiment Setup • Scheduling Policies: – FCFS-U – SJF-U • Evaluation metrics: – Response time – Data transfer time – Aggregate job utility Author email: xwang 149@hawk. iit. edu
Experimental results 34% 64% 52% 48% 54% 149% 12% 74% Author email: xwang 149@hawk. iit. edu The utility optimization model considerably improves job response time, transfer time and aggregate utility.
CDF of Response Time • The utility optimization model improves the response time for both small jobs and large jobs Author email: xwang 149@hawk. iit. edu
CDF of Transfer Time • The utility optimization model improves the transfer time except for some small jobs with SJF policy 51% Author email: xwang 149@hawk. iit. edu
CDF of utility • small jobs are more sensitive to the utility decay, the utility optimization model substantially decreases the percentage of zero utility for small jobs Author email: xwang 149@hawk. iit. edu
Conclusion • Designed a utility-based data transfer scheduler to coordinate multiple data transfer requests to improve system performance and overall user satisfactions. • Implemented our algorithms in an open-source data transfer scheduling simulator and conducted trace-based simulations using real job traces collected from production computing facilities. • The experimental results demonstrate that our utility optimization model considerably improves job response time, transfer time and aggregate utility. Author email: xwang 149@hawk. iit. edu
Future Work • Diverse job types (interactive, batch and real-time jobs with deadlines) • Various utility functions for different job types • More complex network topology • Introducing dynamic external loads Author email: xwang 149@hawk. iit. edu
Thank You! Author email: xwang 149@hawk. iit. edu
- Slides: 25