More about queues and nodes: Difference between revisions

Revision as of 14:49, 20 April 2017

The different queues

The larger hpc clusters, most notably hpc03, hpc06, hpc11 and hpc12, are shared by two or more research groups. On those clusters every group has their own queue, sometimes even more than one. These queues give exclusive and full access to a specific set of nodes.

There is also a guest queue on every hpc cluster that gives access to all nodes, but with some restrictions, you will not be able to run non-rerunable and interactive jobs.

In most cases, access to one of the queues is based on group membership in the Active Directory. If your netid is not a member of the right group, you default to the guest queue if you submit a job. If you have access to the group and bulk network shares of your research group, you should also have access to the normal queue on the hpc cluster. If not, contact the secretary in your research group and let him/her arrange the group membership of your netid.

You can check your default queue by submitting a small test job and then have a look at the list with jobs with the qstat command.

[jsmith@hpc10 ~]$ echo "sleep 60" | qsub 
[jsmith@hpc10 ~]$ qstat -u jsmith

If you see anything other than guest in the third column, then you are all set.

There are two ways to select the guest queue;

With the -q switch on the commandline:

qsub -q guest job1

Or with a directive at the start of your job script:

#PBS -q guest

It is important to know that a job in the guest queue can be interrupted and resumed at any time. You should make sure that the application in your job saves the intermediate results at regular intervals and that it knows how to continue when your job is resumed. If you neglect this, your job in the guest queue will start all over again every time it is interrupted and resumed.

The different nodes

On most hpc clusters you'll find that worker nodes are not all identical, different series of nodes exist which were purchased at different times and with different specifications. To distinguish between the different series of nodes, they are labelled with properties like typea, typeb, typec, etc. On some hpc clusters, nodes have extra properties showing to which queue they belong or showing additional features, like an infiniband network or extra memory compared to similar nodes.

A useful command that shows all nodes and how they are utilized is LOCALnodeload.pl. A typical output looks like this:

[jsmith@hpc10 ~]$ LOCALnodeload.pl
Node       Np State/jobs Load  Properties
---------- -- ---------- ----- ----------
n10-01     12 12         12.01 typea     
n10-02     12 free        0.00 typea     
n10-03     12 free        0.00 typea     
n10-04     12 free        0.00 typea     
n10-05     16 12         11.93 typeb     
n10-06     16 free        0.00 typeb     
n10-07     16 offline     0.00 typeb     
n10-08     16 down        0.00 typeb

The first column (Node) shows the names of the nodes. The second column (Np) shows the total number of processors. The third column (State/jobs) shows the number of processors currenly in use or the status of the node (free, offline or down). The forth colum (Load) shows the actual load on the nodes. In an ideal situation the load matches the number of processors in use. The last column (Properties) shows the properties as described above. As you can see in the example, typea nodes have 12 processors and typeb nodes have 16. Node n10-01 is fully occupied, node n10-05 is running one or more jobs but still has 4 processors free. Nodes n10-07 and n10-08 cannot be used.

Selecting nodes

If you submit a job, the scheduler automatically selects a node to run it. By default a jobs gets one node and one processor. You can manually select the number of processors and nodes for your job by using the -l switch with the qsub command. You can also select nodes by property. the -l switch works like this:

qsub -l nodes=<x>:ppn=<c>:<property>:<property>...

<x> is either an amount of nodes or the name(s) of the selected node(s)
<c> is number of processors per node
<property> is any of the properties you see in Properties column of the LOCALnodeload.pl command.

Examples:

`qsub -l nodes=4`	Request 4 nodes of any type
`qsub -l nodes=n10-07+n10-08`	Request 2 specific nodes by hostname
`qsub -l nodes=4:ppn=2`	Request 2 processors on each of four nodes
`qsub -l nodes=1:ppn=4`	Request 4 processors on one node
`qsub -l nodes=2:typea`	Request 2 nodes with the `typea` property

Instead of using the -l or the -q switches on the commandline when you submit your job with qsub, you can also add them as a directive to your job script. For instance, if you add

#PBS -l nodes=1:ppn=4
#PBS -q guest

at the start of your script, you can just use

qsub job.sh

instead of

qsub -l nodes=1:ppn=4 -q guest job.sh

Avoid over- and underutilization

An important thing to consider when you create your own job script is matching the number of processors that you request with the number of processors that the software in your script will actually use. It is possible that you request only one processor and that your program will use all processors available on the nodes. This is called overutilization and is not very efficient when other jobs are already running on the same node and using the same processors.

It is also possible that you request several (or all) processors and that your program will only use one. This will leave the other processors you claimed unused (underutilization), which is also not very efficient because the unused processors you requested will not be used for other jobs.

How to avoid over- and underutilization? Many programs have options that will let them use only one thread (utilization of only one processor) or a specific number of threads.

For example, Ansys has the -np switch:

ansys -nt N

and Fluent has the -t switch

fluent -tN

where N matches the number of processors that you request in your job.

If your program does not have an option to limit the number of processor, you can try to add this line in your job script, just before the line where your progam starts:

export OMP_NUM_THREADS=N

Of course, N must match the number of processors that you request in your job. Alternatively, you could also request an entire node (all processors) in your job and let your program use all available resources of that node.

Avoid excessive reads and writes on your homedir

Some programs read and write a lot of data to and from your home directory. This is not very efficient, on the nodes your home directory is a network share, so access is relatively slow and it keeps the master node unnecessarily busy. If you expect that your job will do a lot of reading and writing to disk, you can use the local disk on the node instead, which is mounted on /var/tmp on all nodes. You can do this by adding a few extra lines to your job script, right before the line that starts the program in your job, for example:

TMP=/var/tmp/${PBS_JOBID}
mkdir -p ${TMP}
/usr/bin/rsync -vax "${PBS_O_WORKDIR}/" ${TMP}/
cd ${TMP}

Once your program is done you can copy the results back to your home directory and clean up by adding these two lines at the end of your job script.

/usr/bin/rsync -vax ${TMP}/ "${PBS_O_WORKDIR}/"
[ $? -eq 0 ] && /bin/rm -rf ${TMP}

This usually works best if you create a seperate directory in your homedir, move the necessary files and the job script to it and run your job from there. Otherwise you would end up copying your entire home directory to the node for no good reason.

More about queues and nodes: Difference between revisions

Revision as of 14:49, 20 April 2017

Contents

The different queues

The different nodes

Selecting nodes

Avoid over- and underutilization

Avoid excessive reads and writes on your homedir

Navigation menu

More about queues and nodes: Difference between revisions

Revision as of 14:49, 20 April 2017

The different queues

The different nodes

Selecting nodes

Avoid over- and underutilization

Avoid excessive reads and writes on your homedir

Navigation menu

Search