Long Jobs¶
If you are confident that your jobs will last more than 2 days while efficiently using the allocated resources, you can use a long QoS.
sbatch --partition={batch|gpu|bigmem} --qos=<cluster>-<partition>-long [...]
Following EuroHPC/PRACE Recommendations, the long
QOS allow for an extended Max walltime (MaxWall
) set to 14 days.
Node Type | Cluster | Partition | Slurm command |
---|---|---|---|
regular | aion | batch | sbatch [--account=<project>] --partition=batch --qos=aion-batch-long [...] |
regular | iris | batch | sbatch [--account=<project>] --partition=batch --qos=iris-batch-long [--constraint={broadwell,skylake}] [...] |
gpu (v100) | iris | gpu | sbatch [--account=<project>] --partition=gpu --qos=iris-gpu-long --gpus=1 [--constraint=volta{16,32}] [...] |
gpu (h100) | iris | hopper | sbatch [--account=<project>] --partition=hopper --qos=iris-hopper-long --gpus=1 [...] |
bigmem | iris | bigmem | sbatch [--account=<project>] --partition=bigmem --qos=iris-bigmem-long [...] |
Important
Be aware however that special restrictions apply to long jobs. In sort, the constraints are the following.
- There is a per partition limit to the maximum number of concurrent nodes involved in long jobs (call the alias
sqos
defined in UL HPC systems for details). - In
batch
partitions no more than 8 long jobs per User (MaxJobsPU
) are allowed, using no more than 16 nodes per jobs. - In
gpu
partition no more than 4 long jobs per User (MaxJobsPU
) are allowed, using no more than 2 nodes per jobs. - In
bigmem
partition no more than 4 long jobs per User (MaxJobsPU
) are allowed, using no more than 2 nodes per jobs.