Best-effort Jobs¶
Node Type | Slurm command |
---|---|
regular | sbatch [-A <project>] -p batch --qos besteffort [-C {broadwell,skylake}] [...] |
gpu | sbatch [-A <project>] -p gpu --qos besteffort [-C volta[32]] -G 1 [...] |
bigmem | sbatch [-A <project>] -p bigmem --qos besteffort [...] |
Best-effort (preemptible) jobs allow an efficient usage of the platform by filling available computing nodes until regular jobs are submitted.
sbatch -p {batch | gpu | bigmem} --qos besteffort [...]
What means job preemption?
Job preemption is the the act of "stopping" one or more "low-priority" jobs to let a "high-priority" job run. Job preemption is implemented as a variation of Slurm's Gang Scheduling logic.
When a non-best-effort job is allocated resources that are already allocated to one or more best-effort jobs, the preemptable job(s) (thus on QOS besteffort
) are preempted.
On ULHPC facilities, the preempted job(s) can be requeued (if possible) or canceling them.
**For jobs to be requeued, they MUST have the "--requeue
" sbatch option set.
The besteffort
QOS have less constraints than the other QOS (for instance, you can submit more jobs etc. )
As a general rule users should ensure that they track successful completion of best-effort jobs (which may be interrupted by other jobs at any time) and use them in combination with mechanisms such as Checkpoint-Restart that allow applications to stop and resume safely.