SLURM general description

The SLURM software architecture deployed on OSIRIM consists of :

  an interactive node (osirim-slurm.irit.fr)  

This is the node on which you must connect to access the compute cluster. This node (under Linux Centos7 system) can be used to validate programs before running them on the compute cluster. Since this node is shared among all users, it must not be used for running long jobs.

  compute nodes

These nodes (under Linux Centos7 system) are servers dedicated to calculations. The SLURM Job Manager manages on the compute nodes the distribution and execution of the processes that you launch from the interactive node. A process running on a compute node accesses data hosted on the storage array, performs a process, and saves the result on the array.

The calculation nodes are divided into 3 sizing categories :

 - compute nodes of 24 processors and 192 GB of RAM each. These nodes are grouped in a Slurm partition named "24CPUNodes". This partition will be adapted in most cases of use. Each process will however be limited to 24 threads and / or 192 GB of RAM. On the other hand, the number of processes created by a Job, a Job Step or a Task is limited only by the total size of the partition (and the availability of resources): for example a single Job can therefore run, in parallel, 3 Steps 2 Tasks each, each Task creating 24 threads. This Job will use 144 CPUs and will be distributed over 6 nodes. This is the default partition.

- compute nodes of 64 processors and 512 GB of RAM each. These nodes are grouped in a Slurm partition named "64CPUNodes". This partition will be suitable for Jobs requiring more than 24 threads and / or 192 GB of RAM for the same process.

7 computing nodes each with 4 Nvidia Geforce GTX 1080TI graphics cards. These nodes are grouped in a Slurm partition named "GPUNodes". This partition is intended for processing taking advantage of the computing power provided by GPU processors. The frameworks of Deep Learning (TensorFlow, Theano, Pytorch, ...) are the perfect example.

  A storage bay

With a capacity of approximately 1 Po, this storage is provided by an Isilon bay composed of 11 nodes. Data is accessible from the interactive node and compute nodes via the NFS protocol.

For the procedure to connect to this platform, you can consult the section « Login to Slurm ».