More about queues and nodes

From hpcwiki
Jump to navigation Jump to search

The different queues

The larger hpc clusters, most notably hpc03, hpc06, hpc11 and hpc12, are shared by two or more research groups. On those clusters every group has their own queue, sometimes even more than one. These queues give exclusive and full access to a specific set of nodes.

There is also a guest queue on every hpc cluster that gives access to all nodes, but with some restrictions, you will not be able to run non-rerunable and interactive jobs.

In most cases, access to one of the queues is based on group membership in the Active Directory. If your netid is not a member of the right group, you default to the guest queue if you submit a job. If you have access to the group and bulk network shares of your research group, you should also have access to the normal queue on the hpc cluster. If not, contact the secretary in your research group and let him/her arrange the group membership of your netid.

You can check your default queue by submitting a small test job and then have a look at the list with jobs with the qstat command.

[jsmith@hpc10 ~]$ echo "sleep 60" | qsub 
[jsmith@hpc10 ~]$ qstat -u jsmith

If you see anything other than guest in the third column, then you are all set.

There are two ways to select the guest queue;

With the -q switch on the commandline:

qsub -q guest job1

Or with a directive at the start of your job script:

#PBS -q guest

It is important to know that a job in the guest queue can be interrupted and resumed at any time. You should make sure that the application in your job saves the intermediate results at regular intervals and that it knows how to continue when your job is resumed. If you neglect this, your job in the guest queue will start all over again every time it is interrupted and resumed.