Slurm Launcher Examples¶

ULHPC Tutorial / Getting Started ULHPC Tutorial / OpenMP/MPI

When setting your default #SBATCH directive, always keep in mind your expected default resource allocation that would permit to submit your launchers

without options sbatch <launcher> (you will be glad in a couple of month not to have to remember the options you need to pass) and
try to stick to a single node (to avoid to accidentally induce a huge submission).

Resource allocation Guidelines¶

General guidelines

Always try to align resource specifications for your jobs with physical characteristics. Always prefer the use of --ntasks-per-{node,socket} over -n when defining your tasks allocation request to automatically scale appropriately upon multi-nodes submission with for instance sbatch -N 2 <launcher>. Launcher template:

#!/bin/bash -l # <--- DO NOT FORGET '-l' to facilitate further access to ULHPC modules
#SBATCH -p <partition>                     #SBATCH -p <partition>
#SBATCH -N 1                               #SBATCH -N 1
#SBATCH --ntasks-per-node=<n>              #SBATCH --ntasks-per-node <#sockets * s>
#SBATCH -c <thread>                        #SBATCH --ntasks-per-socket <s>
                                           #SBATCH -c <thread>

This would define by default a total of <n> (left) or

$\#sockets \times$ <s> (right) tasks per node, each on <thread> threads. You MUST ensure that either:

<n> $\times$ <thread> matches the number of cores avaiable on the target computing node (left), or
<n>= $\#sockets \times$ <s>, and <s> $\times$ <thread> matches the number of cores per socket available on the target computing node (right).

See Specific Resource Allocation

Node (type)	#Nodes	#Socket / #Cores	RAM [GB]	Features
`aion-[0001-0354]`	354	8 / 128	256	`batch,epyc`
`iris-[001-108]`	108	2 / 28	128	`batch,broadwell`
`iris-[109-168]`	60	2 / 28	128	`batch,skylake`
`iris-[169-186]` (GPU)	18	2 / 28	768	`gpu,skylake,volta`
`iris-[191-196]` (GPU)	6	2 / 28	768	`gpu,skylake,volta32`
`iris-[187-190]` (Large-Memory)	4	4 / 112	3072	`bigmem,skylake`

Aion (default Dual-CPU)

16 cores per socket and 8 (virtual) sockets (CPUs) per aion node. Examples:

#SBATCH -p batch                 #SBATCH -p batch                #SBATCH -p batch
#SBATCH -N 1                     #SBATCH -N 1                    #SBATCH -N 1
#SBATCH --ntasks-per-node=128    #SBATCH --ntasks-per-node 16    #SBATCH --ntasks-per-node 8
#SBATCH --ntasks-per-socket 16   #SBATCH --ntasks-per-socket 2   #SBATCH --ntasks-per-socket 1
#SBATCH -c 1                     #SBATCH -c 8                    #SBATCH -c 16

Iris (default Dual-CPU)

14 cores per socket and 2 sockets (physical CPUs) per regular iris. Examples:

#SBATCH -p batch                #SBATCH -p batch                 #SBATCH -p batch
#SBATCH -N 1                    #SBATCH -N 1                     #SBATCH -N 1
#SBATCH --ntasks-per-node=28    #SBATCH --ntasks-per-node 14     #SBATCH --ntasks-per-node 4
#SBATCH --ntasks-per-socket=14  #SBATCH --ntasks-per-socket 7    #SBATCH --ntasks-per-socket 2
#SBATCH -c 1                    #SBATCH -c 2                     #SBATCH -c 7

Iris (GPU)

14 cores per socket and 2 sockets (physical CPUs) per gpu iris, 4 GPU accelerator cards per node. You probably want to dedicate 1 task and $\frac{1}{4}$ of the available cores to the management of each GPU accelerator. Examples:

#SBATCH -p gpu                  #SBATCH -p gpu                   #SBATCH -p gpu
#SBATCH -N 1                    #SBATCH -N 1                     #SBATCH -N 1
#SBATCH --ntasks-per-node=1     #SBATCH --ntasks-per-node 2      #SBATCH --ntasks-per-node 4
#SBATCH -c 7                    #SBATCH --ntasks-per-socket 1    #SBATCH --ntasks-per-socket 2
#SBATCH -G 1                    #SBATCH -c 7                     #SBATCH -c 7
                                #SBATCH -G 2                     #SBATCH -G 4

Iris (Large-Memory)

28 cores per socket and 4 sockets (physical CPUs) per bigmem iris node. Examples:

#SBATCH -p bigmem              #SBATCH -p bigmem                 #SBATCH -p bigmem
#SBATCH -N 1                   #SBATCH -N 1                      #SBATCH -N 1
#SBATCH --ntasks-per-node=4    #SBATCH --ntasks-per-node 8       #SBATCH --ntasks-per-node 16
#SBATCH --ntasks-per-socket=1  #SBATCH --ntasks-per-socket 2     #SBATCH --ntasks-per-socket 4
#SBATCH -c 28                  #SBATCH -c 14                     #SBATCH -c 7

You probably want to play with a single task but define the expected memory allocation with --mem=<size[units]> (Default units are megabytes - Different units can be specified using the suffix [K|M|G|T])

Basic Slurm Launcher Examples¶

Single core task

1 task per job (Note: prefer GNU Parallel in that case - see below)

#!/bin/bash -l                # <--- DO NOT FORGET '-l'
### Request a single task using one core on one node for 5 minutes in the batch queue
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 1
#SBATCH --time=0-00:05:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
# Safeguard for NOT running this launcher on access/login nodes
module purge || print_error_and_exit "No 'module' command"
# List modules required for execution of the task
module load <...>
# [...]

Multiple Single core tasks

28 single-core tasks per job

#!/bin/bash -l
### Request as many tasks as cores available on a single node for 3 hours
#SBATCH -N 1
#SBATCH --ntasks-per-node=28  # On iris; for aion, use --ntasks-per-node=128
#SBATCH -c 1
#SBATCH --time=0-03:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load <...>
# [...]

Multithreaded parallel tasks

7 multithreaded tasks per job (4 threads each)

#!/bin/bash -l
### Request as many tasks as cores available on a single node for 3 hours
#SBATCH -N 1
#SBATCH --ntasks-per-node=7  # On iris; for aion, use --ntasks-per-node=32
#SBATCH -c 4
#SBATCH --time=0-03:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load <...>
# [...]

Embarrassingly Parallel Tasks¶

For many users, the reason to consider (or being encouraged) to offload their computing executions on a (remote) HPC or Cloud facility is tied to the limits reached by their computing devices (laptop or workstation). It is generally motivated by time constraints

"My computations take several hours/days to complete. On an HPC, it will last a few minutes, no?"

or search-space explorations:

"I need to check my application against a huge number of input pieces (files) - it worked on a few of them locally but takes ages for a single check. How to proceed on HPC?"

In most of the cases, your favorite Java application or R/python (custom) development scripts, iterated again over multiple input conditions, are inherently SERIAL: they are able to use only one core when executed. You thus deal with what is often call a Bag of (independent) tasks, also referred to as embarrassingly parallel tasks.

In this case, you MUST NOT overload the job scheduler with a large number of small (single-core) jobs. Instead, you should use GNU Parallel which permits the effective management of such tasks in a way that optimize both the resource allocation and the completion time.

More specifically, GNU Parallel is a tool for executing tasks in parallel, typically on a single machine. When coupled with the Slurm command srun, parallel becomes a powerful way of distributing a set of tasks amongst a number of workers. This is particularly useful when the number of tasks is significantly larger than the number of available workers (i.e. $SLURM_NTASKS), and each tasks is independent of the others.

ULHPC Tutorial: GNU Parallel launcher for Embarrassingly Parallel Jobs

Luckily, we have prepared a generic GNU Parallel launcher that should be straight forward to adapt to your own workflow following our tutorial:

Create a dedicated script run_<task> responsible to run your java/R/Python tasks while taking as argument the parameter of each run. You can inspire from run_stressme for instance.
- test it in interactive
rename the generic launcher launcher.parallel.sh to launcher_<task>.sh,
- enable #SBATCH --dependency singleton
- set the jobname
- change TASK to point to the absolute path to run_<task> script
- set TASKLISTFILE to point to a files with the parameters to pass to your script for each task
- adapt eventually the #SBATCH --ntasks-per-node [...] and #SBATCH -c [...] to match your needs AND the hardware configs of a single node (28 cores on iris, 128 cores on Aion) -- see guidelines
test a batch run -- stick to a single node to take the best out of one full node.

Serial Task script Launcher¶

Serial Killer (Generic template)

#!/bin/bash -l     # <--- DO NOT FORGET '-l'
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
# C/C++: module load toolchain/intel # OR: module load toolchain/foss
# Java:  module load lang/Java/1.8
# Ruby/Perl/Rust...:  module load lang/{Ruby,Perl,Rust...}
# /!\ ADAPT TASK variable accordingly - absolute path to the (serial) task to be executed
TASK=${TASK:=${HOME}/bin/app.exe}
OPTS=$*

srun ${TASK} ${OPTS}

Serial Python

#!/bin/bash -l
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
# Python 3.X by default (also on system)
module load lang/Python
# module load lang/SciPy-bundle
# and/or: activate the virtualenv <name> you previously generated with
#     python -m venv <name>
source ./<name>/bin/activate
OPTS=$*

srun python [...] ${OPTS}

R

#!/bin/bash -l
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 28
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load lang/R
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*

srun Rscript <script>.R ${OPTS}  |& tee job_${SLURM_JOB_NAME}.out

Matlab

... but why? just use Python or R.

#!/bin/bash -l
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 28
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load math/MATLAB

matlab -nodisplay -nosplash < INPUTFILE.m > OUTPUTFILE.out

Specialized BigData/GPU launchers¶

BigData/[Large-]memory single-core tasks

#!/bin/bash -l
### Request one sequential task requiring half the memory of a regular iris node for 1 day
#SBATCH -J MyLargeMemorySequentialJob       # Job name
#SBATCH --mail-user=Your.Email@Address.lu   # mail me ...
#SBATCH --mail-type=end,fail                # ... upon end or failure
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 1
#SBATCH --mem=64GB         # if above 112GB: consider bigmem partition (USE WITH CAUTION)
#SBATCH --time=1-00:00:00
#SBATCH -p batch           # if above 112GB: consider bigmem partition (USE WITH CAUTION)

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load <...>
# [...]

AI/DL task tasks

#!/bin/bash -l
### Request one GPU tasks for 4 hours - dedicate 1/4 of available cores for its management
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 7
#SBATCH -G 1
#SBATCH --time=04:00:00
#SBATCH -p gpu

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load <...>    # USE apps compiled against the {foss,intel}cuda toolchain !
# Ex: 
# module load numlib/cuDNN

# This should report a single GPU (over 4 available per gpu node)
nvidia-smi
# [...]
srun [...]

pthreads/OpenMP Launcher¶

Always set OMP_NUM_THREADS to match ${SLURM_CPUS_PER_TASK:-1}

You MUST enforce the use of -c <threads> in your launcher to ensure the variable $SLURM_CPUS_PER_TASK exists within your launcher scripts. This is the appropriate value to set for OMP_NUM_THREAD, with default to 1 as extra safely which can be obtained with the following affectation:

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}

Aion (default Dual-CPU)

Single node, threaded (pthreads/OpenMP) application launcher

#!/bin/bash -l
# Single node, threaded (pthreads/OpenMP) application launcher, using all 128 cores of an aion cluster node
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 128
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*

srun /path/to/your/threaded.app ${OPTS}

Iris (default Dual-CPU)

Single node, threaded (pthreads/OpenMP) application launcher

#!/bin/bash -l
# Single node, threaded (pthreads/OpenMP) application launcher, using all 28 cores of an iris cluster node:
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 28
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*

srun /path/to/your/threaded.app ${OPTS}

MPI¶

Intel MPI Launchers¶

Official Slurm guide for Intel MPI

Aion (default Dual-CPU)

Multi-node parallel application IntelMPI launcher

#!/bin/bash -l
# Multi-node parallel application IntelMPI launcher, using 256 MPI processes

#SBATCH -N 2
#SBATCH --ntasks-per-node 128    # MPI processes per node
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/intel
OPTS=$*

srun -n $SLURM_NTASKS /path/to/your/intel-toolchain-compiled-application ${OPTS}

Recall to use si-bigmem to request an interactive job when testing your script.

Iris (default Dual-CPU)

Multi-node parallel application IntelMPI launcher

#!/bin/bash -l
# Multi-node parallel application IntelMPI launcher, using 56 MPI processes

#SBATCH -N 2
#SBATCH --ntasks-per-node 28    # MPI processes per node
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/intel
OPTS=$*

srun -n $SLURM_NTASKS /path/to/your/intel-toolchain-compiled-application ${OPTS}

Recall to use si-gpu to request an interactive job when testing your script on a GPU node.

You may want to use PMIx as MPI initiator -- use srun --mpi=list to list the available implementations (default: pmi2), and srun --mpi=pmix[_v3] [...] to use PMIx.

OpenMPI Slurm Launchers¶

Official Slurm guide for Open MPI

Aion (default Dual-CPU)

Multi-node parallel application OpenMPI launcher

#!/bin/bash -l
# Multi-node parallel application OpenMPI launcher, using 256 MPI processes

#SBATCH -N 2
#SBATCH --ntasks-per-node 128    # MPI processes per node
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss
module load mpi/OpenMPI
OPTS=$*

srun -n $SLURM_NTASKS /path/to/your/foss-toolchain-openMPIcompiled-application ${OPTS}

Iris (default Dual-CPU)

Multi-node parallel application OpenMPI launcher

#!/bin/bash -l
# Multi-node parallel application OpenMPI launcher, using 56 MPI processes

#SBATCH -N 2
#SBATCH --ntasks-per-node 28    # MPI processes per node
#SBATCH -c 1
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss
module load mpi/OpenMPI
OPTS=$*

srun -n $SLURM_NTASKS /path/to/your/foss-toolchain-openMPIcompiled-application ${OPTS}

Hybrid Intel MPI+OpenMP Launcher¶

Aion (default Dual-CPU)

Multi-node hybrid parallel application IntelMPI/OpenMP launcher

#!/bin/bash -l
# Multi-node hybrid application IntelMPI+OpenMP launcher, using 16 threads per socket(CPU) on 2 nodes (256 cores):

#SBATCH -N 2
#SBATCH --ntasks-per-node   8    # MPI processes per node
#SBATCH --ntasks-per-socket 1    # MPI processes per (virtual) processor
#SBATCH -c 16
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/intel
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*

srun -n $SLURM_NTASKS /path/to/your/parallel-hybrid-app ${OPTS}

Iris (default Dual-CPU)

Multi-node hybrid parallel application IntelMPI/OpenMP launcher

#!/bin/bash -l
# Multi-node hybrid application IntelMPI+OpenMP launcher, using 14 threads per socket(CPU) on 2 nodes (56 cores):

#SBATCH -N 2
#SBATCH --ntasks-per-node   2    # MPI processes per node
#SBATCH --ntasks-per-socket 1    # MPI processes per processor
#SBATCH -c 14
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/intel
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*

srun -n $SLURM_NTASKS /path/to/your/parallel-hybrid-app ${OPTS}

Hybrid OpenMPI+OpenMP Launcher¶

Aion (default Dual-CPU)

Multi-node hybrid parallel application OpenMPI/OpenMP launcher

#!/bin/bash -l
# Multi-node hybrid application OpenMPI+OpenMP launcher, using 16 threads per socket(CPU) on 2 nodes (256 cores):

#SBATCH -N 2
#SBATCH --ntasks-per-node   8    # MPI processes per node
#SBATCH --ntasks-per-socket 1    # MPI processes per processor
#SBATCH -c 16
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss
module load mpi/OpenMPI
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*

srun -n $SLURM_NTASKS /path/to/your/parallel-hybrid-app ${OPTS}

Iris (default Dual-CPU)

Multi-node hybrid parallel application OpenMPI/OpenMP launcher

#!/bin/bash -l
# Multi-node hybrid application OpenMPI+OpenMP launcher, using 14 threads per socket(CPU) on 2 nodes (56 cores):

#SBATCH -N 2
#SBATCH --ntasks-per-node   2    # MPI processes per node
#SBATCH --ntasks-per-socket 1    # MPI processes per processor
#SBATCH -c 14
#SBATCH --time=0-01:00:00
#SBATCH -p batch

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load toolchain/foss
module load mpi/OpenMPI
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
OPTS=$*

srun -n $SLURM_NTASKS /path/to/your/parallel-hybrid-app ${OPTS}