Slurm Launcher Examples¶
ULHPC Tutorial / Getting Started ULHPC Tutorial / OpenMP/MPI
When setting your default #SBATCH
directive, always keep in mind your expected default resource allocation that would permit to submit your launchers
- without options
sbatch <launcher>
(you will be glad in a couple of month not to have to remember the options you need to pass) and - try to stick to a single node (to avoid to accidentally induce a huge submission).
Resource allocation Guidelines¶
General guidelines
Always try to align resource specifications for your jobs with physical characteristics.
Always prefer the use of --ntasks-per-{node,socket}
over -n
when defining your tasks allocation request to automatically scale appropriately upon multi-nodes submission with for instance sbatch -N 2 <launcher>
. Launcher template:
#!/bin/bash -l # <--- DO NOT FORGET '-l' to facilitate further access to ULHPC modules
#SBATCH -p <partition> #SBATCH -p <partition>
#SBATCH -N 1 #SBATCH -N 1
#SBATCH --ntasks-per-node=<n> #SBATCH --ntasks-per-node <#sockets * s>
#SBATCH -c <thread> #SBATCH --ntasks-per-socket <s>
#SBATCH -c <thread>
<n>
(left) or \#sockets \times<s>
(right) tasks per node, each on <thread>
threads.
You MUST ensure that either:
<n>
\times<thread>
matches the number of cores avaiable on the target computing node (left), or<n>
=\#sockets \times<s>
, and<s>
\times<thread>
matches the number of cores per socket available on the target computing node (right).
Node (type) | #Nodes | #Socket / #Cores | RAM [GB] | Features |
---|---|---|---|---|
aion-[0001-0354] |
354 | 8 / 128 | 256 | batch,epyc |
iris-[001-108] |
108 | 2 / 28 | 128 | batch,broadwell |
iris-[109-168] |
60 | 2 / 28 | 128 | batch,skylake |
iris-[169-186] (GPU) |
18 | 2 / 28 | 768 | gpu,skylake,volta |
iris-[191-196] (GPU) |
6 | 2 / 28 | 768 | gpu,skylake,volta32 |
iris-[187-190] (Large-Memory) |
4 | 4 / 112 | 3072 | bigmem,skylake |
16 cores per socket and 8 (virtual) sockets (CPUs) per aion
node. Examples:
#SBATCH -p batch #SBATCH -p batch #SBATCH -p batch
#SBATCH -N 1 #SBATCH -N 1 #SBATCH -N 1
#SBATCH --ntasks-per-node=128 #SBATCH --ntasks-per-node 16 #SBATCH --ntasks-per-node 8
#SBATCH --ntasks-per-socket 16 #SBATCH --ntasks-per-socket 2 #SBATCH --ntasks-per-socket 1
#SBATCH -c 1 #SBATCH -c 8 #SBATCH -c 16
14 cores per socket and 2 sockets (physical CPUs) per regular iris
. Examples:
#SBATCH -p batch #SBATCH -p batch #SBATCH -p batch
#SBATCH -N 1 #SBATCH -N 1 #SBATCH -N 1
#SBATCH --ntasks-per-node=28 #SBATCH --ntasks-per-node 14 #SBATCH --ntasks-per-node 4
#SBATCH --ntasks-per-socket=14 #SBATCH --ntasks-per-socket 7 #SBATCH --ntasks-per-socket 2
#SBATCH -c 1 #SBATCH -c 2 #SBATCH -c 7
14 cores per socket and 2 sockets (physical CPUs) per gpu iris
, 4 GPU accelerator cards per node.
You probably want to dedicate 1 task and \frac{1}{4} of the available cores to the management of each GPU accelerator. Examples:
#SBATCH -p gpu #SBATCH -p gpu #SBATCH -p gpu
#SBATCH -N 1 #SBATCH -N 1 #SBATCH -N 1
#SBATCH --ntasks-per-node=1 #SBATCH --ntasks-per-node 2 #SBATCH --ntasks-per-node 4
#SBATCH -c 7 #SBATCH --ntasks-per-socket 1 #SBATCH --ntasks-per-socket 2
#SBATCH -G 1 #SBATCH -c 7 #SBATCH -c 7
#SBATCH -G 2 #SBATCH -G 4
28 cores per socket and 4 sockets (physical CPUs) per bigmem iris
node. Examples:
#SBATCH -p bigmem #SBATCH -p bigmem #SBATCH -p bigmem
#SBATCH -N 1 #SBATCH -N 1 #SBATCH -N 1
#SBATCH --ntasks-per-node=4 #SBATCH --ntasks-per-node 8 #SBATCH --ntasks-per-node 16
#SBATCH --ntasks-per-socket=1 #SBATCH --ntasks-per-socket 2 #SBATCH --ntasks-per-socket 4
#SBATCH -c 28 #SBATCH -c 14 #SBATCH -c 7
--mem=<size[units]>
(Default units are megabytes - Different units can be specified using the suffix [K|M|G|T]
)
Basic Slurm Launcher Examples¶
1 task per job (Note: prefer GNU Parallel in that case - see below)
#!/bin/bash -l # <--- DO NOT FORGET '-l'
### Request a single task using one core on one node for 5 minutes in the batch queue
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 1
#SBATCH --time=0-00:05:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
# Safeguard for NOT running this launcher on access/login nodes
module purge || print_error_and_exit "No 'module' command"
# List modules required for execution of the task
module load <...>
# [...]
28 single-core tasks per job
#!/bin/bash -l
### Request as many tasks as cores available on a single node for 3 hours
#SBATCH -N 1
#SBATCH --ntasks-per-node=28 # On iris; for aion, use --ntasks-per-node=128
#SBATCH -c 1
#SBATCH --time=0-03:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load <...>
# [...]
7 multithreaded tasks per job (4 threads each)
#!/bin/bash -l
### Request as many tasks as cores available on a single node for 3 hours
#SBATCH -N 1
#SBATCH --ntasks-per-node=7 # On iris; for aion, use --ntasks-per-node=32
#SBATCH -c 4
#SBATCH --time=0-03:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load <...>
# [...]
Embarrassingly Parallel Tasks¶
For many users, the reason to consider (or being encouraged) to offload their computing executions on a (remote) HPC or Cloud facility is tied to the limits reached by their computing devices (laptop or workstation). It is generally motivated by time constraints
"My computations take several hours/days to complete. On an HPC, it will last a few minutes, no?"
or search-space explorations:
"I need to check my application against a huge number of input pieces (files) - it worked on a few of them locally but takes ages for a single check. How to proceed on HPC?"
In most of the cases, your favorite Java application or R/python (custom) development scripts, iterated again over multiple input conditions, are inherently SERIAL: they are able to use only one core when executed. You thus deal with what is often call a Bag of (independent) tasks, also referred to as embarrassingly parallel tasks.
In this case, you MUST NOT overload the job scheduler with a large number of small (single-core) jobs. Instead, you should use GNU Parallel which permits the effective management of such tasks in a way that optimize both the resource allocation and the completion time.
More specifically, GNU Parallel is a tool for executing tasks in parallel, typically on a single machine. When coupled with the Slurm command srun, parallel
becomes a powerful way of distributing a set of tasks amongst a number of workers. This is particularly useful when the number of tasks is significantly larger than the number of available workers (i.e. $SLURM_NTASKS
), and each tasks is independent of the others.
ULHPC Tutorial: GNU Parallel launcher for Embarrassingly Parallel Jobs
Luckily, we have prepared a generic GNU Parallel launcher that should be straight forward to adapt to your own workflow following our tutorial:
- Create a dedicated script
run_<task>
responsible to run your java/R/Python tasks while taking as argument the parameter of each run. You can inspire fromrun_stressme
for instance.- test it in interactive
-
rename the generic launcher
launcher.parallel.sh
tolauncher_<task>.sh
,- enable
#SBATCH --dependency singleton
- set the jobname
- change TASK to point to the absolute path to
run_<task>
script - set TASKLISTFILE to point to a files with the parameters to pass to your script for each task
- adapt eventually the
#SBATCH --ntasks-per-node [...]
and#SBATCH -c [...]
to match your needs AND the hardware configs of a single node (28 cores on iris, 128 cores on Aion) -- see guidelines
- enable
-
test a batch run -- stick to a single node to take the best out of one full node.
Serial Task script Launcher¶
#!/bin/bash -l # <--- DO NOT FORGET '-l'
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
# C/C++: module load toolchain/intel # OR: module load toolchain/foss
# Java: module load lang/Java/1.8
# Ruby/Perl/Rust...: module load lang/{Ruby,Perl,Rust...}
# /!\ ADAPT TASK variable accordingly - absolute path to the (serial) task to be executed
TASK=${TASK:=${HOME}/bin/app.exe}
OPTS=$*
srun ${TASK} ${OPTS}
#!/bin/bash -l
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
# Python 3.X by default (also on system)
module load lang/Python
# module load lang/SciPy-bundle
# and/or: activate the virtualenv <name> you previously generated with
# python -m venv <name>
source ./<name>/bin/activate
OPTS=$*
srun python [...] ${OPTS}
#!/bin/bash -l
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 28
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load lang/R
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*
srun Rscript <script>.R ${OPTS} |& tee job_${SLURM_JOB_NAME}.out
... but why? just use Python or R.
#!/bin/bash -l
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 28
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load math/MATLAB
matlab -nodisplay -nosplash < INPUTFILE.m > OUTPUTFILE.out
Specialized BigData/GPU launchers¶
BigData/[Large-]memory single-core tasks
#!/bin/bash -l
### Request one sequential task requiring half the memory of a regular iris node for 1 day
#SBATCH -J MyLargeMemorySequentialJob # Job name
#SBATCH --mail-user=Your.Email@Address.lu # mail me ...
#SBATCH --mail-type=end,fail # ... upon end or failure
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 1
#SBATCH --mem=64GB # if above 112GB: consider bigmem partition (USE WITH CAUTION)
#SBATCH --time=1-00:00:00
#SBATCH -p batch # if above 112GB: consider bigmem partition (USE WITH CAUTION)
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load <...>
# [...]
AI/DL task tasks
#!/bin/bash -l
### Request one GPU tasks for 4 hours - dedicate 1/4 of available cores for its management
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 7
#SBATCH -G 1
#SBATCH --time=04:00:00
#SBATCH -p gpu
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load <...> # USE apps compiled against the {foss,intel}cuda toolchain !
# Ex:
# module load numlib/cuDNN
# This should report a single GPU (over 4 available per gpu node)
nvidia-smi
# [...]
srun [...]
pthreads/OpenMP Launcher¶
Always set OMP_NUM_THREADS
to match ${SLURM_CPUS_PER_TASK:-1}
You MUST enforce the use of -c <threads>
in your launcher to ensure the variable $SLURM_CPUS_PER_TASK
exists within your launcher scripts.
This is the appropriate value to set for OMP_NUM_THREAD
, with default to 1 as extra safely which can be obtained with the following affectation:
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
Single node, threaded (pthreads/OpenMP) application launcher
#!/bin/bash -l
# Single node, threaded (pthreads/OpenMP) application launcher, using all 128 cores of an aion cluster node
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 128
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*
srun /path/to/your/threaded.app ${OPTS}
Single node, threaded (pthreads/OpenMP) application launcher
#!/bin/bash -l
# Single node, threaded (pthreads/OpenMP) application launcher, using all 28 cores of an iris cluster node:
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 28
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*
srun /path/to/your/threaded.app ${OPTS}
MPI¶
Intel MPI Launchers¶
Official Slurm guide for Intel MPI
Multi-node parallel application IntelMPI launcher
#!/bin/bash -l
# Multi-node parallel application IntelMPI launcher, using 256 MPI processes
#SBATCH -N 2
#SBATCH --ntasks-per-node 128 # MPI processes per node
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/intel
OPTS=$*
srun -n $SLURM_NTASKS /path/to/your/intel-toolchain-compiled-application ${OPTS}
si-bigmem
to request an interactive job when testing your script.
Multi-node parallel application IntelMPI launcher
#!/bin/bash -l
# Multi-node parallel application IntelMPI launcher, using 56 MPI processes
#SBATCH -N 2
#SBATCH --ntasks-per-node 28 # MPI processes per node
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/intel
OPTS=$*
srun -n $SLURM_NTASKS /path/to/your/intel-toolchain-compiled-application ${OPTS}
si-gpu
to request an interactive job when testing your script on a GPU node.
You may want to use PMIx as MPI initiator -- use srun --mpi=list
to list the available implementations (default: pmi2), and srun --mpi=pmix[_v3] [...]
to use PMIx.
OpenMPI Slurm Launchers¶
Official Slurm guide for Open MPI
Multi-node parallel application OpenMPI launcher
#!/bin/bash -l
# Multi-node parallel application OpenMPI launcher, using 256 MPI processes
#SBATCH -N 2
#SBATCH --ntasks-per-node 128 # MPI processes per node
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss
module load mpi/OpenMPI
OPTS=$*
srun -n $SLURM_NTASKS /path/to/your/foss-toolchain-openMPIcompiled-application ${OPTS}
Multi-node parallel application OpenMPI launcher
#!/bin/bash -l
# Multi-node parallel application OpenMPI launcher, using 56 MPI processes
#SBATCH -N 2
#SBATCH --ntasks-per-node 28 # MPI processes per node
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss
module load mpi/OpenMPI
OPTS=$*
srun -n $SLURM_NTASKS /path/to/your/foss-toolchain-openMPIcompiled-application ${OPTS}
Hybrid Intel MPI+OpenMP Launcher¶
Multi-node hybrid parallel application IntelMPI/OpenMP launcher
#!/bin/bash -l
# Multi-node hybrid application IntelMPI+OpenMP launcher, using 16 threads per socket(CPU) on 2 nodes (256 cores):
#SBATCH -N 2
#SBATCH --ntasks-per-node 8 # MPI processes per node
#SBATCH --ntasks-per-socket 1 # MPI processes per (virtual) processor
#SBATCH -c 16
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/intel
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*
srun -n $SLURM_NTASKS /path/to/your/parallel-hybrid-app ${OPTS}
Multi-node hybrid parallel application IntelMPI/OpenMP launcher
#!/bin/bash -l
# Multi-node hybrid application IntelMPI+OpenMP launcher, using 14 threads per socket(CPU) on 2 nodes (56 cores):
#SBATCH -N 2
#SBATCH --ntasks-per-node 2 # MPI processes per node
#SBATCH --ntasks-per-socket 1 # MPI processes per processor
#SBATCH -c 14
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/intel
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*
srun -n $SLURM_NTASKS /path/to/your/parallel-hybrid-app ${OPTS}
Hybrid OpenMPI+OpenMP Launcher¶
Multi-node hybrid parallel application OpenMPI/OpenMP launcher
#!/bin/bash -l
# Multi-node hybrid application OpenMPI+OpenMP launcher, using 16 threads per socket(CPU) on 2 nodes (256 cores):
#SBATCH -N 2
#SBATCH --ntasks-per-node 8 # MPI processes per node
#SBATCH --ntasks-per-socket 1 # MPI processes per processor
#SBATCH -c 16
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss
module load mpi/OpenMPI
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*
srun -n $SLURM_NTASKS /path/to/your/parallel-hybrid-app ${OPTS}
Multi-node hybrid parallel application OpenMPI/OpenMP launcher
#!/bin/bash -l
# Multi-node hybrid application OpenMPI+OpenMP launcher, using 14 threads per socket(CPU) on 2 nodes (56 cores):
#SBATCH -N 2
#SBATCH --ntasks-per-node 2 # MPI processes per node
#SBATCH --ntasks-per-socket 1 # MPI processes per processor
#SBATCH -c 14
#SBATCH --time=0-01:00:00
#SBATCH -p batch
print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss
module load mpi/OpenMPI
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*
srun -n $SLURM_NTASKS /path/to/your/parallel-hybrid-app ${OPTS}