Skip to content

Long Jobs

If you are confident that your jobs will last more than 2 days while efficiently using the allocated resources, you can use a long QoS.

sbatch --partition={batch|gpu|bigmem} --qos=<cluster>-<partition>-long [...]

Following EuroHPC/PRACE Recommendations, the long QOS allow for an extended Max walltime (MaxWall) set to 14 days.

Node Type Cluster Partition Slurm command
regular aion batch sbatch [--account=<project>] --partition=batch --qos=aion-batch-long [...]
regular iris batch sbatch [--account=<project>] --partition=batch --qos=iris-batch-long [--constraint={broadwell,skylake}] [...]
gpu (v100) iris gpu sbatch [--account=<project>] --partition=gpu --qos=iris-gpu-long --gpus=1 [--constraint=volta{16,32}] [...]
gpu (h100) iris hopper sbatch [--account=<project>] --partition=hopper --qos=iris-hopper-long --gpus=1 [...]
bigmem iris bigmem sbatch [--account=<project>] --partition=bigmem --qos=iris-bigmem-long [...]

Important

Be aware however that special restrictions apply to long jobs. In sort, the constraints are the following.

  • There is a per partition limit to the maximum number of concurrent nodes involved in long jobs (call the alias sqos defined in UL HPC systems for details).
  • In batch partitions no more than 8 long jobs per User (MaxJobsPU) are allowed, using no more than 16 nodes per jobs.
  • In gpu partition no more than 4 long jobs per User (MaxJobsPU) are allowed, using no more than 2 nodes per jobs.
  • In bigmem partition no more than 4 long jobs per User (MaxJobsPU) are allowed, using no more than 2 nodes per jobs.