Description of Slurm partitions¶
The following subsections sum up the parameters of the different partitions available on the DCE. If you want to find the information provided on this page you can log on the gateway dce.metz.centralesupelec.fr
and run the sinfo
command.
GPU partition¶
These partitions are meant for GPU computing and redirect jobs onto GPU resources.
Partition name | Max jobs / user | Max nodes / job | Max Walltime | Availability | Node pool | Allowed slurm commmands |
---|---|---|---|---|---|---|
gpu_inter | 1 | 1 | 02:00:00 | anytime | sh,cam,tx | srun,salloc,sbatch |
gpu_tp | 1 | 1 | 02:00:00 | anytime | cam,tx | srun,salloc,sbatch |
gpu_prod_night | 4 | 1 | 12:00:00 | nights and weekends * | cam,tx | sbatch |
gpu_prod_long | 4 | 1 | 48:00:00 | anytime | sh | sbatch |
gpu_legacy | - | 1 | 48:00:00 | anytime | sky | srun,salloc,sbatch |
* : 6pm-7am from Monday to Friday, 24/24 on Saturday and Sunday
Users that have a labwork account can access nodes through the gpu_tp partition. Users that have a project account can book on gpu_inter for a coding session and then submit their jobs on gpu_prod_night or gpu_prod_long.
Generally speaking, we give priority to the labworks. If you have a project account, running an interactive session on either a cam or tx node, you might be preempted if, on a last minute, we need to allocate that node for a labwork. In practice, that might rarely happen but you have to be aware of that possibility.
For long running jobs (e.g. training a neural network), you are expected to submit these trainings on either gpu_prod_night or gpu_prod_long as batch jobs. We give examples of batch submission in the documentation. The gpu_prod_night partition contains a lot of nodes (32) but is only opened during the evening and week ends and can host jobs running not longer than 12hours. The reason is that these nodes might used for a lab work during office hours.
CPU partitions¶
These partitions are meant for computing requiring high end CPUs.
Partition name | Max jobs / user | Max nodes / job | Max Walltime | Availability | Node pool | Allowed slurm commmands |
---|---|---|---|---|---|---|
cpu_inter | 1 | 1 | 2:00:00 | anytime | kyle[01-68],sar[01-32] | srun, salloc, sbatch |
cpu_tp | 1 | 1 | 2:00:00 | anytime | kyle[01-68],sar[01-32] | srun, salloc, sbatch |
cpu_prod | 4 | 1 | 12:00:00 | anytime | kyle[01-68] | sbatch |
cpu_prod_sar | 4 | 1 | 12:00:00 | specific periods | sar[01-32] | sbatch |
Depending on your accounts, you may have access to a high number of nodes per job with the following QOS :
QOS name | Max nodes / job |
---|---|
8nodespu | 8 |
16nodespu | 16 |
32nodespu | 32 |
64nodespu | 64 |
Users that have a labwork account can access nodes through the cpu_tp partition. Users that have a project account can book on cpu_inter for a coding session and then submit their jobs on cpu_prod.
Generally speaking, we give priority to the labworks. If you have a project account, running an interactive session, you might be preempted if, on a last minute, we need to allocate that node for a labwork. In practice, that might rarely happen but you have to be aware of that possibility.
For long running jobs (e.g. training a neural network), you are expected to submit these trainings on either gpu_prod_night or gpu_prod_long as batch jobs. We give examples of batch submission in the documentation.