Slot Acquisition Presenter Daniel Nurmi Scope One aspect

  • Slides: 21
Download presentation
Slot Acquisition Presenter: Daniel Nurmi

Slot Acquisition Presenter: Daniel Nurmi

Scope • One aspect of VGDL request is the time ‘slot’ when resources are

Scope • One aspect of VGDL request is the time ‘slot’ when resources are needed – Earliest time when resource set is needed – Maximum duration resource set will be used • Three classes of resources – dedicated: always available – batch controlled: lag before available – advanced reservation: guaranteed availability in the future

Acquisition Routines • Each class of resource needs the following (logical) routines – –

Acquisition Routines • Each class of resource needs the following (logical) routines – – Prob = Query (cluster, nodes, walltime, starttime) Id = Bind. Init (cluster, nodes, walltime, starttime, success_prob) Status = Check (id) Status = Install(id)

Slot Manager Acquisition Procedure Is available? Bind Slot Manager Query probability Initiate bind Query()

Slot Manager Acquisition Procedure Is available? Bind Slot Manager Query probability Initiate bind Query() Bind. Init() Bind yet? Check() True/false/abort Install PBS glide-in when time Install()

Dedicated • Query – NOP (prob = 1) • Bind. Init – NOP (always

Dedicated • Query – NOP (prob = 1) • Bind. Init – NOP (always true) • Check – NOP (always true) • Install – Installs PBS glide-in

Advanced Reservation • Query – Makes request to advanced reservation system – Prob =

Advanced Reservation • Query – Makes request to advanced reservation system – Prob = 1 if we can make the reservation – Prob = 0 if we cannot • Bind. Init – Make adv. res. Request • Check – NOP (always return true) • Install – Submit PBS glide-in installation job to specialized adv. res. queue

Batch Controlled • Query – Performs an algorithm to determine probability of meeting the

Batch Controlled • Query – Performs an algorithm to determine probability of meeting the slot requirement through regular batch queue • Bind. Init – Use values calculated from ‘query’ for job dimensions and time to wait before submission • Check – When ‘time to wait’ has elapsed, return true • Install – Submit PBS glide-in installation job

The Algorithm • Routines – ‘deadline’ is ‘seconds from now’ – P = bqp_pred(machine,

The Algorithm • Routines – ‘deadline’ is ‘seconds from now’ – P = bqp_pred(machine, nodes, walltime, deadline) • Algorithm Preq = 0. 75 past = 0 P = bqp_pred(M, N, W+D, D) While((D-past) > 0) { if (P ~ Preq) { wait = past real_walltime = W+(D-past) } past += 30 P = bqp_pred(M, N, W+(D-past), (D-past)) }

Batch Experiment now • • • 75% is the target probability 356 total requests

Batch Experiment now • • • 75% is the target probability 356 total requests 257 total batch submissions – 99 requests resulted in initial ‘not possible’ response • • • 192 slots successfully acquired 257 *. 75 = 193 Choose last acceptable time to minimize waste 0. 75 submit time

Near Term Experiments • Try other probability levels • Try other deadlines

Near Term Experiments • Try other probability levels • Try other deadlines

PBS Glide-in • Basic batch queue system assumes one-to-one mapping of job to resource

PBS Glide-in • Basic batch queue system assumes one-to-one mapping of job to resource set (slot) • Idea: once a single ‘slot’ has been acquired, install ‘personal’ res. manager and scheduler within it in order to support multiple jobs within single slot • Have instrumented torque (PBS) to fulfill this task – Plays the role that Condor would play as infrastructure scheduler – PBS “glide-in” – Simpler, supports MPI, etc.

PBS Overview PBS Server PBS Sched Tr an sfe rs cr ip t. A

PBS Overview PBS Server PBS Sched Tr an sfe rs cr ip t. A qsub ‘script. A’ script. A gets node 1, node 2, and node 3 PBS Mom node 1 node 2 node 3 node 4

PBS Overview PBS Server PBS Sched PBS Mom node 1 node 2 node 3

PBS Overview PBS Server PBS Sched PBS Mom node 1 node 2 node 3 node 4 script. A ssh cmd cmd

PBS glide-in qsub pglide. pbs e d i gl s b. p PBS Server

PBS glide-in qsub pglide. pbs e d i gl s b. p PBS Server PBS Sched p PBS Mom node 1 node 2 node 3 node 4

PBS glide-in PBS Server PBS Sched PBS Mom node 1 pglide. pbs_server pbs_sched node

PBS glide-in PBS Server PBS Sched PBS Mom node 1 pglide. pbs_server pbs_sched node 2 pbs_mom node 3 pbs_mom node 4 pbs_mom

globusrun-ws job. A globusrun-ws job. B GRAM PBS glide-in PBS Server PBS Sched qsub

globusrun-ws job. A globusrun-ws job. B GRAM PBS glide-in PBS Server PBS Sched qsub script. A qsub script. B PBS Mom node 1 pglide. pbs_server pbs_sched node 2 pbs_mom script. A node 3 pbs_mom node 4 pbs_mom script. B

PBS glide-in TODO • In order to implement this, needed to disable some of

PBS glide-in TODO • In order to implement this, needed to disable some of PBS internal security features (drop privs, root check, priv ports, user auth checks, host auth checks) • Streamline installation process (good but not great) • Architecture discussion: one server per slot? One server for all slots on a single machine? – Requires reworking torque software a bit

Slot Acquisition Status • BQP ‘virtual advanced reservation’ system in place • PBS glide-in

Slot Acquisition Status • BQP ‘virtual advanced reservation’ system in place • PBS glide-in working on all machines Dan has access to • Need to investigate advanced reservation interface(s) • Need to figure out how to properly submit PBS jobs using GRAM

Thanks! • Questions?

Thanks! • Questions?

Statistics TODO • More reactive change point detection – Machine down time constitutes a

Statistics TODO • More reactive change point detection – Machine down time constitutes a change point we can detect better – Better understanding of autocorrelation and quantiles • Non-statistical case – One user submits 20, 000 single processor jobs

Current Cluster Status Dedicated Batch Advanced Controlled Res. Dante X ? NCSA Mercury X

Current Cluster Status Dedicated Batch Advanced Controlled Res. Dante X ? NCSA Mercury X ? SDSC Teragrid X ? ADA X ? IU Big. Red X ? IU Tyr X ? IU TG