Introduction

From hpcwiki
Revision as of 13:50, 14 September 2016 by Paul Keekstra (talk | contribs)
Jump to navigation Jump to search

About the TU Delft hpc clusters

Several research groups at the TU Delft have exclusive access to hpc (high performance computing) clusters that are managed and maintained by the SSC ICT. Currently there are nine, they are typically named hpcXX.tudelft.net where the XX part is a number ranging from 03 to 12. The hpc clusters are Beowulf style clusters.

The purpose of this wiki is to guide new users, who have never used a Beowulf cluster before, into using these clusters. Beowulf clusters are typically made of a bunch common and identical computers or servers linked together in a network. This can be used to perform parallel computing tasks on a much larger scale than would be possible on a single workstation or a heavy duty server.

About Linux

The operating system on all our clusters is CentOS 7, this is a well known server class Linux distribution. In order to use our hpc clusters you should at least have some basic knowledge of Linux on the command line. You will not find a Linux tutorial in this wiki, there are many tutorials on the internet that will teach you how to use Linux. Just use the phrase "Linux for beginners" in your favorite search engine and you'll find plenty of sources online, for example the UNIX Tutorial for Beginners from the University of Surrey. You can als opt to buy a book, Linux for dummies would be a good start, but there are many others that'll do just fine. If you have never used Linux on the command line before, please take the time to learn the basics, it will be well worth the effort.

More details

Typically a Beowulf style clusters consists of a master node and several worker nodes. The master node is the machine where you log in to and where you prepare and manage your parallel jobs. A scheduler (Maui) and a resource manager (Torque) are both running on the master node. They both work together and provide a mechanism to submit jobs that will be run on the worker nodes where the actual parallel processing takes place. These nodes are stand alone computers connected through a local network to the master node and patiently waiting until the scheduler and the resource manager on the master node tells them to run a (part of a) parallel job.