OSCAR Workload Management Jeremy Enos OSCAR Annual Meeting

  • Slides: 11
Download presentation
OSCAR Workload Management Jeremy Enos OSCAR Annual Meeting January 10 -11, 2002

OSCAR Workload Management Jeremy Enos OSCAR Annual Meeting January 10 -11, 2002

Topics n n n n Current Batch System – Open. PBS How it Works,

Topics n n n n Current Batch System – Open. PBS How it Works, Job Flow Open. PBS Pros/Cons Schedulers Enhancement Options Future Considerations Future Plans for OSCAR

Open. PBS n n PBS = Portable Batch System Components n n Server –

Open. PBS n n PBS = Portable Batch System Components n n Server – single instance Scheduler – single instance Mom – runs on compute nodes Client commands – run anywhere n n qsub qstat qdel xpbsmon

Open. PBS - How it Works n n User submits job with “qsub” Execution

Open. PBS - How it Works n n User submits job with “qsub” Execution host (mom) must launch all other processes n n mpirun ssh/rsh/dsh pbsdsh Output n n spooled on execution host (or in user’s home dir) moved back to user node (rcp/scp)

Open. PBS – Job Flow User Node (runs qsub) Job output rcp/scp Server (queues

Open. PBS – Job Flow User Node (runs qsub) Job output rcp/scp Server (queues job) Execution host (mother superior) Compute Nodes Scheduler (tells server what to run)

Open. PBS – Monitor (xpbsmon)

Open. PBS – Monitor (xpbsmon)

Open. PBS - Schedulers n Stock Scheduler n n n Pluggable Basic, FIFO Maui

Open. PBS - Schedulers n Stock Scheduler n n n Pluggable Basic, FIFO Maui n n n Plugs into PBS Sophisticated algorithms Reservations Open Source Supported Redistributable

Open. PBS – in OSCAR 2 1. 2. List of available machines Select PBS

Open. PBS – in OSCAR 2 1. 2. List of available machines Select PBS for queuing system 1. 2. Select one node for server Select one node for scheduler 1. 3. 4. Select scheduler Select nodes for compute nodes Select configuration scheme n n staggered mom process launcher (mpirun, dsh, pbsdsh, etc)

Open. PBS – On the Scale Pros n Open Source n Large user base

Open. PBS – On the Scale Pros n Open Source n Large user base n Portable n Best option available n Modular scheduler Cons n License issues n 1 year+ devel lag n Scalability limitations n n n number of hosts number of jobs monitor (xpbsmon) Steep learning curve Node failure intolerance Not developed on Linux

Open. PBS – Enhancement Options n qsub wrapper scripts/java apps n n n 3

Open. PBS – Enhancement Options n qsub wrapper scripts/java apps n n n 3 rd party tools, wrappers, monitors Scalability source patches “Staggered moms” model n n easier for users allows for more control of bad user input large cluster scaling Maui Silver model n n “Cluster of clusters” diminishes scaling requirements never attempted yet

Future Considerations for OSCAR n Replace Open. PBS (with what? when? ) n n

Future Considerations for OSCAR n Replace Open. PBS (with what? when? ) n n n large clusters are still using PBS Negotiate better licensing with Veridian Continue incorporating enhancements n n test Maui Silver, staggered mom, etc. 3 rd party extras, monitoring package