Scheduling a 100 000 Core Supercomputer for Maximum



















- Slides: 19
Scheduling a 100, 000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer
Outline · Intro to NICS and Kraken · Weekly utilization averages >90% for 6+ weeks · How 90% utilization was accomplished on Kraken – System scheduling goals – Policy change based on some past work – Influencing end user behavior – Scheduling and utilization details: closer look at three specific weeks · Conclusion and Future Work 2
National Institute for Computational Sciences · JICS and NICS is a collaboration between UT and ORNL · UT awarded the NSF Track 2 B ($65 M) · Phased deployment of Cray XT systems with 1 PF in 2009 · Total JICS funding ~$100 M
Kraken on Oct 2009 #4 Fastest machine in the world (Top 500 6/10) First academic petaflop Delivers over 60% of all NSF cycles – – – – 4 8, 256 dual socket, 16 GB memory nodes 2. 6 GHz 6 -core AMD Istanbul processor per socket 1. 03 Petaflops peak performance (99, 072 cores) Cray Seastar 2 Torus interconnect 3. 3 Petabytes DDN disk (raw) 129 Terabytes memory 88 cabinets 2, 200 sq ft
Kraken Cray XT 5 Weekly Utilization October 2009 – June 2010 100 90 80 70 71 64 Percent 60 50 40 30 50 65 58 55 53 63 47 49 39 39 96 94 92 91 90 87 94 87 83 85 91 90 86 78 84 83 81 77 79 77 74 66 70 92 92 37 34 Kraken XT 5 Utilization 35 20 10 0 5 - 121926 2 - 9 - 162330 7 - 142128 4 - 111825 1 - 8 - 152229 5 - 121926 3 - 10172431 7 - 142128 5 о - - - ноно - - - д - - - ян - - - ф ф - - м м - - - а - - - м - - и - - - и ктококок я я нононоекдедеде в яняняневев ф ф арар м м м п апапапай м м ю и и и ю т т т я я я к к к в в в евев арарар р р айай н ю ю ю л н н н Date 5
Kraken Weekly Utilization · Previous slide shows: – Weekly utilization over 90% for 7 of the last 9 weeks. Excellent! – Weekly utilization over 80% for 18 of the last 21 weeks. Very good! – Weekly utilization over 70% each week since implementing the new scheduling policy in mid January (red vertical line) · How was this accomplished? … 6
How was 90% utilization accomplished? · Taking a closer look at Kraken: – Scheduling goals – Policy – Influencing user behavior – Analysis of 3 specific weeks · Nov 9 - one month into production with new configuration · Jan 4 – during a typical slow month · Mar 1 – after implementation of policy change 7
System Scheduling Goals · 1. Capability computing Allow “hero” jobs that run at or near the 99, 072 maximum core size in order to bring new scientific results · 2. Capacity computing Provide as many delivered floating point operations as possible to Kraken users (keep utilization high) · Typically these are antagonistic aspirations for a single system. Scheduling algorithms for capacity computing can lead to inefficiencies · Goal: Improve utilization of a large system while allowing large capability job runs. Attempt to do both capability and capacity computing! · Prior work @ SDSC led to a new approach 8
Policy · Normal approach to capability computing is to accept large jobs, include a weighting factor that increases with queue wait time, leading to eventual draining of the system to run the large capability job. · Major drawback is this can lead to reduction in the overall usage of the system · Next slide illustrates this 9
Typical Large System Utilization red arrows indicate system drain for capability job 100 90 80 70 Percent 60 50 40 30 20 10 0 Date/Time 10
Policy Change · Based on past work @ SDSC, our new approach would be to drain the system on a periodic basis and run the capability jobs in succession · Allow “dedicated” job runs: full machine with job owner access to Kraken only. This was needed for file system performance · Allow “capacity” job runs: near full machine without dedicated system access · Coincide the run of dedicated and capacity jobs during Preventative Maintenance (PM) time once a week 11
Policy Change · Reservation would be placed to have the scheduler drain the system prior to the PM · After PM dedicated jobs would be run in succession followed by capacity jobs run in succession · No PM, no dedicated jobs · No PM, capacity jobs limited to a specific time period · This had a drastic affect on system utilization as we will show! 12
Influencing User Behavior · To encourage capability computing jobs, NICS instituted a 50% discount for running dedicated and capacity jobs · Discounts were given post job completion 13
Utilization Analysis · The following selected weekly utilization charts show the dramatic affects of running such a large system and implementing the policy change for successive capability job runs 14
11/9 0: 00 11/9 3: 00 11/9 6: 00 11/9 9: 00 11/9 12: 00 11/9 15: 00 11/9 18: 00 11/9 21: 00 11/10 0: 00 11/10 3: 00 11/10 6: 00 11/10 9: 00 11/10 12: 00 11/10 15: 00 11/10 18: 00 11/10 21: 00 11/11 0: 00 11/11 3: 00 11/11 6: 00 11/11 13: 00 11/11 16: 00 11/11 19: 00 11/11 22: 00 11/12 1: 00 11/12 4: 00 11/12 7: 00 11/12 10: 00 11/12 13: 00 11/12 16: 00 11/12 19: 00 11/12 22: 00 11/13 1: 00 11/13 4: 00 11/13 7: 00 11/13 10: 00 11/13 13: 00 11/13 16: 00 11/13 19: 00 11/13 22: 00 11/14 1: 00 11/14 4: 00 11/14 7: 00 11/14 10: 00 11/14 13: 00 11/14 16: 00 11/14 19: 00 11/14 22: 00 11/15 1: 00 11/15 4: 00 11/15 7: 00 11/15 10: 00 11/15 13: 00 11/15 22: 00 Percent Utilization Prior to Policy Change 55% average Percent CPU Utilization Nov 9 -16, 2009 100 90 80 70 60 15 50 40 30 20 10 0 Date/Time
1/4 00: 00 1/4 03: 00 1/4 06: 00 1/4 09: 00 1/4 12: 00 1/4 15: 00 1/4 18: 00 1/4 21: 00 1/5 00: 00 1/5 03: 00 1/5 06: 00 1/5 09: 00 1/5 12: 00 1/5 15: 00 1/5 18: 00 1/5 21: 00 1/6 00: 00 1/6 03: 00 1/6 06: 00 1/6 11: 00 1/6 14: 00 1/6 17: 00 1/6 20: 00 1/6 23: 00 1/7 02: 00 1/7 05: 00 1/7 08: 00 1/7 11: 00 1/7 14: 00 1/7 18: 00 1/7 21: 00 1/8 00: 00 1/8 03: 00 1/8 06: 00 1/8 09: 00 1/8 12: 00 1/8 15: 00 1/8 19: 00 1/8 22: 00 1/9 01: 00 1/9 04: 00 1/9 07: 00 1/9 10: 00 1/9 13: 00 1/9 16: 00 1/9 19: 00 1/9 22: 00 1/10 01: 00 1/10 04: 00 1/10 07: 00 1/10 10: 00 1/10 13: 00 1/10 16: 00 1/10 19: 00 1/10 22: 00 Percent Utilization During Slow Period 34% average Percent CPU Utilization Jan 4 - 10, 2010 100 90 80 70 60 16 50 40 30 20 10 0 Date/Time
3/1 00: 00 3/1 03: 00 3/1 06: 00 3/1 09: 00 3/1 12: 00 3/1 15: 00 3/1 18: 00 3/1 21: 00 3/2 00: 00 3/2 03: 00 3/2 06: 00 3/2 09: 00 3/2 12: 00 3/2 15: 00 3/2 18: 00 3/2 21: 00 3/3 00: 00 3/3 03: 00 3/3 06: 00 3/3 11: 00 3/3 14: 00 3/3 17: 00 3/3 20: 00 3/3 23: 00 3/4 02: 00 3/4 05: 00 3/4 08: 00 3/4 11: 00 3/4 14: 00 3/4 17: 00 3/4 20: 00 3/4 23: 00 3/5 02: 00 3/5 05: 00 3/5 08: 00 3/5 11: 00 3/5 14: 00 3/5 17: 00 3/5 20: 00 3/6 01: 00 3/6 04: 00 3/6 07: 00 3/6 10: 00 3/6 13: 00 3/6 16: 00 3/6 19: 00 3/6 22: 00 3/7 01: 00 3/7 04: 00 3/7 07: 00 3/7 10: 00 3/7 13: 00 Percent Utilization After Policy Change 92% average, only one system drain Percent CPU Utilization March 1 - 7, 2010 100 90 80 70 60 17 50 40 30 20 10 0 Date/Time
Conclusions · Running a large computational resource and allowing capability computing can coincide with high utilization if the right balance between goals, policy and user influences are struck. 18
Future Work · Automation of this type of scheduling policy · Methods to evaluate storage requirements of capability jobs prior to execution in attempt to prevent job failures due to file system use · Automation of dedicated run setup 19