Using the BYU Supercomputers maryloux 128 nodes 2

Using the BYU Supercomputers

maryloux • 128 nodes – 2 Pentium 4 processors @ 2. 4 GHz – 2 GB RAM • Myrinet interconnect – If you use mpicc, mpi. CC, mpif 77 and mpirun, the myrinet is automatically configured • Local Scratch disk on each node – NFS to your home directory

marylou • • • Silicon Graphics (SGI) Origin 3900 series 128 processors 64 GB shared memory 8 Processors login 120 processors batch Local scratch disk available

Compilers • GNU – gcc – g++ – g 77 • Portland Group – pgcc – pg. CC – pgf 77 • Parallel (Automatically links with MPI libraries) – mpicc – mpi. CC – mpif 77

Other Stuff • Documentation – http: //marylou. byu. edu • Launching parallel jobs – done through the batch scheduler – Your job is a shell script that you hand to the batch scheduler for execution

Batch job scheduler • Batch Schedulers – PBS (Portable Batch System) open source • The process – user submits jobs to queue – machines register with scheduler offering to run jobs of certain class – scheduler allocates jobs to machines and tracks them – once started, jobs are scheduled by kernel

Scheduling parallel jobs • jobs can ask for – number of nodes (1 CPU) – number of tasks per node (multiple CPUs) – non shared nodes (multiple CPUs) • mixing jobs can be bad – two intense I/O processes on a 2 CPU node can ruin performance for both – same for two RAM intensive processes

Scheduling parallel jobs (2) • All allocated nodes and processors and resources are allocated for the duration of the entire job • No dynamic adjustments, except by creating jobs with multiple steps – each step can have different requirements – each step can express dependency on other steps

Scheduling parallel jobs (3) • Management must – allow some jobs to use the entire machine – allow short jobs to get started quickly they should not have to wait weeks in the queue • Some very long jobs may be needed, but are to be avoided

Backfill scheduling Job C 10 nodes system Job D Job B Job A A B C D time

Backfill scheduling • Requires real time limit to be set • More accurate (shorter) estimate gives more chance to be running earlier • Short jobs can move through system quicker • Uses system better by avoiding waste of cycles during wait

Using PBS • Basic Commands – – qsub qdel qstat or showq pbsnodes –a • Other commands – – – – – qalter: qhold: qmsg: qmove: qrls: qrerun: qselect: qsig: pbsdsh: Alter a batch job Hold a batch job Send a message to a batch job Move a batch job to another queue Release held jobs Rerun a batch job Select a specific subset of jobs Send a signal to a batch job Run a shell command on all nodes allocated

• #!/bin/sh #PBS -l walltime=60: 00 #PBS -l mem=800 mb #PBS -l ncpus=4 #PBS -j oe cd /scratch 1/username/test mpirun –n 4 test. exe > test. out

• # Interpret this program using the bash shell • #PBS -S /bin/bash • # Send email to my account on this server when this job begins, ends, or aborts • #PBS -m abe • # Use a max of 2 CPUs • #PBS -l nodes=2 • # Name this job something meaningful to YOU • #PBS -N my_pbs_job • # Change to my directory • cd my_prog_directory • # Run my program and save results to my_pbs_job. out • mpirun -np 2 a. out

Sample showq output bash-2. 05 a$ showq ACTIVE JOBS----------JOBNAME USERNAME m 1015 i. 1581. 0 m 1015 i. 1582. 0 m 1015 i. 1580. 0 … m 1015 i. 1615. 0 m 1015 i. 1613. 0 m 1015 i. 1575. 0 m 1015 i. 1127. 0 … m 1015 i. 1567. 0 m 1015 i. 1569. 0 m 1015 i. 1547. 0 m 1015 i. 1546. 0 35 Active Jobs STATE PROC REMAINING STARTTIME taskman Running 1 18: 39: 00 Wed Aug 14 08: 06: 24 taskman dvd mdt 36 Running 1 1 4 8 21: 33: 42 23: 43: 05 2: 15: 10: 38 2: 23: 14: 21 Wed Wed Aug 14 11: 06 Aug 14 13: 10: 29 Aug 14 04: 38: 02 Aug 7 12: 41: 45 jar 65 to 5 Running 4 4 8 8 9: 04: 07: 44 9: 08: 28: 16 9: 21: 11: 49 Tue Wed Aug Aug 150 of 26 of 17: 35: 08 21: 55: 40 10: 39: 13 184 Processors Active (81. 52%) 34 Nodes Active (76. 47%) IDLE JOBS-----------JOBNAME USERNAME m 1015 i. 1513. 0 m 1015 i. 1572. 0 … 13 13 14 14 jl 447 dvd STATE PROC WCLIMIT QUEUETIME Idle 2 8 5: 00: 00 3: 00: 00 Tue Aug 13 07: 08: 09 Tue Aug 13 10: 45: 18 STATE PROC WCLIMIT QUEUETIME 23 Idle Jobs NON-QUEUED JOBS--------JOBNAME USERNAME Total Jobs: 58 Active Jobs: 35 Idle Jobs: 23 Non-Queued Jobs: 0
- Slides: 15