Description of Slurm partitions

The following subsections sum up the parameters of the different partitions available on the DCE. If you want to find the information provided on this page you can log on the gateway chome.metz.supelec.fr and run the sinfo command.

GPU partition

These partitions are meant for GPU computing and redirect jobs onto GPU resources.

Partition name Max jobs / user Max nodes / job Max Walltime Availability Node pool Allowed slurm commmands
gpu_inter 1 1 02:00:00 anytime sh,cam,tx srun,salloc,sbatch
gpu_tp 1 1 02:00:00 anytime cam,tx srun,salloc,sbatch
gpu_prod_night 4 1 12:00:00 nights and weekends * cam,tx sbatch
gpu_prod_long 4 1 48:00:00 anytime sh sbatch
gpu_legacy - 1 48:00:00 anytime sky srun,salloc,sbatch

* : 6pm-7am from Monday to Friday, 24/24 on Saturday and Sunday

Users that have a labwork account can access nodes through the gpu_tp partition. Users that have a project account can book on gpu_inter for a coding session and then submit their jobs on gpu_prod_night or gpu_prod_long.

Generally speaking, we give priority to the labworks. If you have a project account, running an interactive session on either a cam or tx node, you might be preempted if, on a last minute, we need to allocate that node for a labwork. In practice, that might rarely happen but you have to be aware of that possibility.

For long running jobs (e.g. training a neural network), you are expected to submit these trainings on either gpu_prod_night or gpu_prod_long as batch jobs. We give examples of batch submission in the documentation. The gpu_prod_night partition contains a lot of nodes (32) but is only opened during the evening and week ends and can host jobs running not longer than 12hours. The reason is that these nodes might used for a lab work during office hours.

CPU partitions

These partitions are meant for computing requiring high end CPUs.

Partition name Max jobs / user Max nodes / job Max Walltime Availability Node pool Allowed slurm commmands
cpu_inter 1 1 2:00:00 anytime kyle[01-68],sar[01-32] srun, salloc, sbatch
cpu_tp 1 1 2:00:00 anytime kyle[01-68],sar[01-32] srun, salloc, sbatch
cpu_prod 4 1 12:00:00 anytime kyle[01-68] sbatch
cpu_prod_sar 4 1 12:00:00 specific periods sar[01-32] sbatch

Depending on your accounts, you may have access to a high number of nodes per job with the following QOS :

QOS name Max nodes / job
8nodespu 8
16nodespu 16
32nodespu 32
64nodespu 64

Users that have a labwork account can access nodes through the cpu_tp partition. Users that have a project account can book on cpu_inter for a coding session and then submit their jobs on cpu_prod.

Generally speaking, we give priority to the labworks. If you have a project account, running an interactive session, you might be preempted if, on a last minute, we need to allocate that node for a labwork. In practice, that might rarely happen but you have to be aware of that possibility.

For long running jobs (e.g. training a neural network), you are expected to submit these trainings on either gpu_prod_night or gpu_prod_long as batch jobs. We give examples of batch submission in the documentation.