How to use the HPCC to do stuff

  • Slides: 35
Download presentation
How to use the HPCC to do stuff Presentation to Qu. ERG March 28

How to use the HPCC to do stuff Presentation to Qu. ERG March 28 th, 2016 *Matt’s soothing voice is not a source of necessary nutrients

What is HPCC? And i. CER? § High Performance Computing Center § Collection of

What is HPCC? And i. CER? § High Performance Computing Center § Collection of computers that has 600 nodes with 7000 computing cores and has large memory nodes (6 TB) § Has lots of available software § https: //wiki. hpcc. msu. edu/display/hpccdocs/Installed+Software § i. CER is a research unit that maintains MSU’s supercomputer system § Provide 1 -on-1 consulting

When would I use the HPCC? § § Takes too long for computation Runs

When would I use the HPCC? § § Takes too long for computation Runs out of memory Needs licensed software Read/write lots of data

How do I connect to the HPCC 1. 2. 3. 4. You need to

How do I connect to the HPCC 1. 2. 3. 4. You need to set up an account with HPCC ssh to HPCC Login to gateway node ssh to developer node to run test code and submit jobs 5. Transfer files using SFTP connection

How to request a new account Have your PI fill out the form using

How to request a new account Have your PI fill out the form using the link below http: //www. hpcc. msu. edu/request

Once you have an account …. § Login using an ssh (Secure Shell) program

Once you have an account …. § Login using an ssh (Secure Shell) program § Many program options to choose from § Moba. Xterm § Pu. TTY § Terminal on Macs § First two can be obtained on portable flash drives from i. CER or downloaded § Host Name (or IP address) will be hpcc. msu. edu § Might be better to use rsync. hpcc. msu. edu

Login Pu. TTY Moba. Xterm

Login Pu. TTY Moba. Xterm

Logged into gateway

Logged into gateway

Gateway nodes § Shared drive by anyone with an HPCC account § Only means

Gateway nodes § Shared drive by anyone with an HPCC account § Only means of accessing the HPCC computing resources § ****DO NOT RUN ANYTHING ON THESE NODES!!******

Switching to developer mode § Look at the developer Nodes usage § Choose one

Switching to developer mode § Look at the developer Nodes usage § Choose one that has low or med usage § Use ssh to login to switch to developer node eg: [user@gateway-00 ~]$ ssh dev-intel 14 Tells you which node you are logged into and the name of the folder (automatically appears) ~ means Home folder Code you type to switch to dev -intel 14

Begin working § The command line takes unix commands to do work § Can

Begin working § The command line takes unix commands to do work § Can use many different text editors to edit files § emacs file. ext § nano file. ext § vi file. ext § joe file. ext § cat file. ext (This only prints what is in the file cannot edit)

Basic Linux Commands Command § § § § cd directory cd. . /. .

Basic Linux Commands Command § § § § cd directory cd. . /. . cd - cd ~ mkdir directory rmdir directory Meaning Change Directory Down one directory Down two directories Return to previous Directory Go to home/username Make named directory Remove an empty directory

Basic Linux Commands Command Meaning § ls Show contents of current folder Some options

Basic Linux Commands Command Meaning § ls Show contents of current folder Some options for ls command -a list all files and directories -F append indicator (one of */=@|) to entries -h print sized -l list with a long listing format -t sort by modification time

More Linux commands Command § cp source destination § cp –r source destination §

More Linux commands Command § cp source destination § cp –r source destination § mv source destination § pwd § rm filename. ext § rm –r folder/ Meaning Copy files recursively: files and directories move a file (can be used as a rename command!) Show current path remove file remove directoryrecursively (i. e. including all subdirs and files)

How to find commands? How to find how to do something? § Google is

How to find commands? How to find how to do something? § Google is usually the first place to look § An exhaustive list § http: //ss 64. com/bash/ § • A useful cheatsheet § http: //fosswire. com/post/2007/08/unixlinux- command-cheatsheet/ § • Explain a command given to you § http: //explainshell. com/

MSU HPCC specific commands Command § sj § gmod § qsub § getexample Meaning

MSU HPCC specific commands Command § sj § gmod § qsub § getexample Meaning show jobs show the home screen with development node use levels submit a job to the scheduler shows a list of available examples that can be loaded to current directory

Module § § § This is how you access different softwares module list :

Module § § § This is how you access different softwares module list : prints list of the loaded programs module load Open. BUGS : load Open. BUGS module unload modulename : unload a module spider keyword search the modules for a key word and lists what can be loaded includes the different versions § module purge : Unload all modules

Working on the HPCC § Can run small tasks on the Developer nodes §

Working on the HPCC § Can run small tasks on the Developer nodes § If runs for longer than 2 hours or uses too much memory it can be canceled without warning § Use Developer Nodes to test your code and determine the resources you need to run jobs § When it takes a long time to run you will then submit jobs to the scheduler § qsub myjob. sub

What do I mean by a Job? § A job is just a list

What do I mean by a Job? § A job is just a list of commands that I tell the computers to do in a job script § It also contains the requirements to run the job § This will then be submitted to the scheduler and eventually run on the computers when resources become available based on what you request § When the job is done running you can check your results by looking at the files (etc. ) created

How the scheduler works § Ranks submitted jobs based on a priority system §

How the scheduler works § Ranks submitted jobs based on a priority system § Priority is influenced by how long they have been in the queue and resources they request § Large jobs may be in the queue for a long time before they begin running § Jobs less than 4 hours typically start running the fastest (sometimes immediately)

Creating job scripts § What is needed in a job script? § List of

Creating job scripts § What is needed in a job script? § List of required resources § Run time § Memory § Number of nodes § number of cores per node § All command line instructions needed to run the computations

Typical submission script Special system command Login to Shell Resource requests Shell command Special

Typical submission script Special system command Login to Shell Resource requests Shell command Special Environment variables Shell command Instructions to scheduler

Job Script details § # is normally a comment except § #! Special system

Job Script details § # is normally a comment except § #! Special system commands § #!/bin/bash –login this logs you so you can run the § § job #PBS instructions to the scheduler #PBS –l nodes=n, ppn=p #PBS –l walltime=hh: mm: ss #PBS –l mem=2 GB (!!not per core but total memory)

Instructions to scheduler § All lines starting with #PBS need to be above the

Instructions to scheduler § All lines starting with #PBS need to be above the first non-commented line in the script. If they are below the first non-commented line of code, the scheduler will not read them, leading to unexpected behavior. § More options at http: //wiki. hpcc. msu. edu/x/Np-T § All jobs must have § Walltime requesting § Memory requested § # of nodes requested and processers per node (ppn)

Advanced Environment Variables § PBS_JOBID § the job number of the current job §

Advanced Environment Variables § PBS_JOBID § the job number of the current job § PBS_O_WORKDIR § The working directory from which the job was submitted § ${} tells the computer that this is a variable § e. g. mkdir ${PBS_O_WORKDIR}/${PBS_JOBID}

Ways to run R § Rscript my. Rprogram. r § R < my. Rprogram.

Ways to run R § Rscript my. Rprogram. r § R < my. Rprogram. r --no-save § Does not save workspace § Or R < my. Rprogram. r --save § Save workspace § Or R < my. Rprogram. r --vanilla § do not read any user or site profiles or restored data at start up and to not save data files at exit § R CMD BATCH my. Rprogram. r

Submitting jobs § qsub submission. script § Submit a job to the queue will

Submitting jobs § qsub submission. script § Submit a job to the queue will return a job ID# § Typically looks like 5945571. cmgr 0 § Time to completion Queue Run Time

Checking on your jobs § qdel job. ID# § delete a job from the

Checking on your jobs § qdel job. ID# § delete a job from the queue § showq –u userid § show the current job queue of the users § sj § show the status of jobs (running, eligible, or blocked) § checkjob job. ID# § Check status of the job § showstart –e all job. ID# § Show the estimated start time of the job

Scheduling tips § Requesting more resources does not make a job run faster unless

Scheduling tips § Requesting more resources does not make a job run faster unless running a parallel program § more resources requested makes it “harder” for the schedule to reserve those resources § First time: over-estimate how many resources you need, and then modify appropriately § qstat –f ${PBS_JOBID} put this code at the bottom of the script to show you resources used information when the job is done

Advanced Scheduling Tips § Large portion of clusters are buy-in that can only run

Advanced Scheduling Tips § Large portion of clusters are buy-in that can only run jobs that are less than 4 hours § Most nodes have at least 24 GB memory § Half have at least 64 GB of memory § Few have more than 64 Gb of memory § (i. e. harder to schedule jobs that requests lots of memory)

System Limitations § 10 eligible jobs in the queue (other will be temporarily blocked

System Limitations § 10 eligible jobs in the queue (other will be temporarily blocked until jobs start running) § 520 running cores (nodes*ppn) § 1000 submitted jobs § 1 week of walltime § ppn=64 § 2 TB memory on a single core § ~200 GB Hard Drive

Job Completion § By default the job will automatically gnerate two files when it

Job Completion § By default the job will automatically gnerate two files when it completes: § Standard Output: § Standard Error E. g. jobname. o 5945571 E. g. jobname. e 5945571 § You can combine these files if you add the join option in our submission script: #PBS –j oe § You can change the output file name § #PBS –o /mnt/scratch/home/netid/myoutputfile. txt

Transferring Files § SFTP program (Secure File Transfer) § Two options are § Moba.

Transferring Files § SFTP program (Secure File Transfer) § Two options are § Moba. Xterm § Win. SCP § Gateway is : rsync. hpcc. msu. edu § Drag and drop files between personal computer and HPCC

Where to go for help § i. CER office hours Monday and Thursdays 1

Where to go for help § i. CER office hours Monday and Thursdays 1 to 2 § Biomedical & Physical Sciences Building 567 Wilson Road, Room 1440 § Also by appointment § http: //contact. icer. msu. edu : contact HPCC by submitting a ticket/contact form (msu login) § wiki. hpcc. msu. edu : HPCC User Wiki § icer. msu. edu : i. CER Home § hpcc. msu. edu : HPCC Home

How to convert windows files to unix § dos 2 unix § Converts special

How to convert windows files to unix § dos 2 unix § Converts special end-of-line characters from windows format to unix § dos 2 unix my. Rprogram. r § Only necessary if you use an editor that does not use unix end of line character