Welcome to HPC Please log in with Hawk

  • Slides: 34
Download presentation
Welcome to HPC � Please log in with Hawk. ID (IOWA domain) and password

Welcome to HPC � Please log in with Hawk. ID (IOWA domain) and password to the type of machine with which you are most comfortable � Mac users: You don’t need to log into i. Cloud � Windows may take a while for login � You may have to press the power button on the Apple � To switch between the Dell and the Apple computer, press scroll lock twice 9/18/2020 1

Intro to HPC with the Helium Cluster Ben Rogers ITS-Research Services October 3, 2012

Intro to HPC with the Helium Cluster Ben Rogers ITS-Research Services October 3, 2012 9/18/2020 2

What is your field? � Chemistry � Engineering � Genetics � Hydrology � Imaging

What is your field? � Chemistry � Engineering � Genetics � Hydrology � Imaging � Physics � Statistics � Others? 9/18/2020 3

Cluster Overview � What is a Compute Cluster? � High Performance Computing � High

Cluster Overview � What is a Compute Cluster? � High Performance Computing � High Throughput Computing � The Helium Cluster � Mapping Your Problem to a Cluster 9/18/2020 4

What is a Compute Cluster? � Large number of computers � Software that allows

What is a Compute Cluster? � Large number of computers � Software that allows them to work together � A tool for solving large computational problems that require more memory or cpu than is available on a single system http: //www. flickr. com/photos/fullaperture/5435786866/sizes/l/in/photostream/ 9/18/2020 5

High Performance Computing � Using multiple computers in a coordinated way to solve a

High Performance Computing � Using multiple computers in a coordinated way to solve a single problem � Provides the ability to: ◦ Use 10 s-1000 s of cores to solve a single problem ◦ Allows access to 10 s-1000 s of GB of Ram � Likely to require substantial code modification to use a library such as MPI � Common Examples: ◦ Computational Fluid Dynamics ◦ Molecular Dynamics 9/18/2020 6

High Throughput Computing � Using multiple computers in a coordinated way to solve many

High Throughput Computing � Using multiple computers in a coordinated way to solve many individual problems � Provides the ability to: ◦ Analyze many data sets simultaneously ◦ Efficiently perform a parameter sweep � Requires minimal code modifications � Common Examples: ◦ Image Analysis ◦ Genomics 9/18/2020 7

The Helium Cluster � Collaborative Cluster � Cent. OS 5 (Linux) � ~400 compute

The Helium Cluster � Collaborative Cluster � Cent. OS 5 (Linux) � ~400 compute nodes � ~3800 processor cores � 24 -144 GB of Ram/node � 300 TB+ Storage � 40 Gb/s Infiniband Network 8

Helium Storage � Home Account Storage ◦ NFS ◦ 80 TB Total ◦ 1

Helium Storage � Home Account Storage ◦ NFS ◦ 80 TB Total ◦ 1 TB per User � Shared Scratch Storage ◦ NFS: 150 TB ◦ Gluster: 146 TB ◦ Deleted after 30 days � Local Scratch Storage ◦ 600 GB+/Compute Node � No Backups! 9/18/2020 9

So It Just Runs Faster, Right? � Not quite! � Just running on Helium

So It Just Runs Faster, Right? � Not quite! � Just running on Helium won’t necessarily make your program faster. http: //basementgeographer. blogspot. com/2012/03/international-racing-colours. html 9/18/2020 10

Mapping Your Problem to a Cluster � Is a cluster a good fit? ◦

Mapping Your Problem to a Cluster � Is a cluster a good fit? ◦ If your problem �Is not tractable on your desktop system �Requires more memory than your desktop has available �Requires rapid turnaround of results that you can’t achieve with a desktop system �Would benefit from having jobs scheduled �Don’t want to tie up your desktop computer � Your problem may be a good candidate for a cluster! 9/18/2020 11

Mapping Your Problem to a Cluster � Next Questions � Next Steps ◦ Does

Mapping Your Problem to a Cluster � Next Questions � Next Steps ◦ Does your job run on Linux? ◦ Can your job run in batch mode? ◦ Is your job HPC or HTC? ◦ ◦ Develop Strategy for Running Jobs Install Software Develop Job Submission Scripts Run Your Job 9/18/2020 12

The Challenge: Analyze 1000 MRIs � Run Freesurfer on 1000 MRIs ◦ Takes 20

The Challenge: Analyze 1000 MRIs � Run Freesurfer on 1000 MRIs ◦ Takes 20 Hours per MRI ◦ Requires 2 GB of Memory/analysis � Desktop Analysis Time ◦ 20 Hours x 1000 MRIs = 20, 000 Hours � 2. 3 Years! ◦ But I have a Quad Core Desktop with 8 GB �That’s still over six months! http: //surfer. nmr. mgh. harvard. edu/ 9/18/2020 13

Analyze 1000 MRIs: Using the Helium Cluster � Good fit for cluster? – Yes

Analyze 1000 MRIs: Using the Helium Cluster � Good fit for cluster? – Yes � Type of problem – HTC � Software – Runs on Linux in batch mode Time to Analyze � On Helium – As little as 20 hours ◦ Time dependent on cores available, likely complete within a week. ◦ Possible to run all analyses simultaneously � 1000 processor cores – Total on Helium > 3700 � 2000 GB of memory – Total on Helium > 9000 GB 9/18/2020 14

Analyze 1000 MRIs: Using the Helium Cluster � What’s the catch? ◦ Time and

Analyze 1000 MRIs: Using the Helium Cluster � What’s the catch? ◦ Time and effort needed to understand how to run your analysis on Helium ◦ Shared Resource �Job wait time �Job eviction 9/18/2020 15

Break � Take ~10 minutes � Start in on hands on activities 9/18/2020 16

Break � Take ~10 minutes � Start in on hands on activities 9/18/2020 16

Poll � Has anyone not used a Unix or Linux command line before? �

Poll � Has anyone not used a Unix or Linux command line before? � Mac OS X Terminal counts 9/18/2020 17

Logging In � Getting ◦ ◦ ◦ logged in to Helium Need to use

Logging In � Getting ◦ ◦ ◦ logged in to Helium Need to use your Iowa domain password Windows will use Secure. CRT Mac will use Secure. CRT or ssh from Terminal Make sure you do not check “save password” You will be prompted to accept a key. �Say yes to accepting the key 9/18/2020 18

I’m logged in; now what? � I’m connected; how do I start working? ◦

I’m logged in; now what? � I’m connected; how do I start working? ◦ Helium is a batch, queued system ◦ Jobs must be submitted to a queue, wait their turn; then they are processed �When running in a batch system you do not interact with your program in real time. �Have to specify options in your job script 9/18/2020 19

Transferring Data to Helium � Globus Online/Grid. FTP �https: //www. icts. uiowa. edu/confluence/display/ICTSit /Globus+Online

Transferring Data to Helium � Globus Online/Grid. FTP �https: //www. icts. uiowa. edu/confluence/display/ICTSit /Globus+Online � sftp � IPSwitch WS_FTP on Windows �https: //helpdesk. its. uiowa. edu/software/downloa d/wsftp/ � Fetch on Mac �https: //helpdesk. its. uiowa. edu/software/downl oad/fetch/ 9/18/2020 20

Queues � Investor queues ◦ Investor owned, guaranteed access ◦ Access restricted to specified

Queues � Investor queues ◦ Investor owned, guaranteed access ◦ Access restricted to specified users � UI queue ◦ Treated like an investor queue but everyone has access ◦ Limited to 25 RUNNING jobs per user � all. q ◦ No job limits ◦ Subject to eviction 9/18/2020 21

Slots � Slots are roughly equivalent to processor cores � The number of slots

Slots � Slots are roughly equivalent to processor cores � The number of slots you specify can determine how soon your job will be able to run � When using mpi we recommend using full machines (eg. Request a number of slots evenly divisible by the number of slots on a machine. ) 9/18/2020 22

Which queue to use? 9/18/2020 23

Which queue to use? 9/18/2020 23

Job Scripts � Similar to regular shell scripts � First line must contain path

Job Scripts � Similar to regular shell scripts � First line must contain path to a valid shell ◦ Ex. #!/bin/bash � Comments start # � Directives start #$ ◦ Specify options to Grid Engine ◦ These are not comments � Explanation of sleeper script in editor 9/18/2020 24

Submit a job � Copy job file from /nfsscratch/sleeper. sh ◦ cp /nfsscratch/sleeper. sh.

Submit a job � Copy job file from /nfsscratch/sleeper. sh ◦ cp /nfsscratch/sleeper. sh. � Edit sleeper. sh ◦ Vi or emacs if you have a favorite editor ◦ If new, try nano � Change to your email address � Launch job ◦ qsub sleeper. sh �Your job 5588776 ("sleeper") has been submitted 9/18/2020 25

What’s my job doing? � Check the status of your job � Check for

What’s my job doing? � Check the status of your job � Check for output files ◦ qstat –u [username] ◦ Shows all your jobs ◦ [Scriptname]. o[jobnumber] ◦ Example: Sleeper. o 3101075 9/18/2020 26

Stopping a job � What if you want to cancel a job? � Let’s

Stopping a job � What if you want to cancel a job? � Let’s submit and try it ◦ qdel ◦ ◦ ◦ Edit sleep 60 to sleep 180 (sleeps for 180 seconds) Submit job Check status qdel [jobnumber] Wait a few seconds and then check status 9/18/2020 27

Modules � What ◦ ◦ ◦ are environment modules? System for changing your shell

Modules � What ◦ ◦ ◦ are environment modules? System for changing your shell environment Can be loaded and unloaded Find modules via “module avail” Load via “module load” Unload via “module unload” � Try to invoke mpicc -show � Load the openmpi_gnu_1. 4. 3 module � Now try to invoking mpicc -show � If a program is not working, you may need to load a module 9/18/2020 28

I/O Streams & Redirection � Can ◦ ◦ anyone tell me what I/O streams

I/O Streams & Redirection � Can ◦ ◦ anyone tell me what I/O streams are Standard input (stdin) Standard output (stdout) (1) Standard error (stderr) (2) Both stdout & stderr (&) �Only some shells (bash) support this ◦ Redirect using > for output and < for input 2> redirects stderr and &> redirects both � What’s an example of how you’ve used them? ◦ Redirection happens when your job runs on the cluster 9/18/2020 29

Common issue � Newline characters � “I’m getting weird errors and I don’t see

Common issue � Newline characters � “I’m getting weird errors and I don’t see anything wrong with my file. ” 9/18/2020 30

What are those ^M doing there? 9/18/2020 31

What are those ^M doing there? 9/18/2020 31

Options for Windows Users � Use a windows editor that has a UNIX line

Options for Windows Users � Use a windows editor that has a UNIX line ending setting such as notepad++ (free software) � Edit your text files exclusively on Helium � Use dos 2 unix on Helium to convert the files 9/18/2020 32

Additional Help � Individual Consulting ◦ By appointment � Contact Us: ◦ HPC-Sysadmins@iowa. uiowa.

Additional Help � Individual Consulting ◦ By appointment � Contact Us: ◦ HPC-Sysadmins@iowa. uiowa. edu � For additional details visit ◦ http: //www. hpc. uiowa. edu 9/18/2020 33

Questions? � ben-rogers@uiowa. edu � Suggestions? � Additional Information �http: //www. hpc. uiowa. edu

Questions? � ben-rogers@uiowa. edu � Suggestions? � Additional Information �http: //www. hpc. uiowa. edu �http: //its. uiowa. edu/apps/service. aspx? id=67 �https: //www. icts. uiowa. edu/confluence/display/ICTSit /Helium+Cluster+Quick+Start+Guide 9/18/2020 34