Skip to content

ULHPC GPU Nodes

Each GPU node provided as part of the gpu partition feature 4x Nvidia V100 SXM2 (with either 16G or 32G memory) interconnected by the NVLink 2.0 architecture

NVlink was designed as an alternative solution to PCI Express with higher bandwidth and additional features (e.g., shared memory) specifically designed to be compatible with Nvidia's own GPU ISA for multi-GPU systems -- see wikichip article.

Because of the hardware organization, you MUST follow the below recommendations:

  1. Do not run jobs on GPU nodes if you have no use of GPU accelerators!, i.e. if you are not using any of the software compiled against the {foss,intel}cuda toolchain.
  2. Avoid using more than 4 GPUs, ideally within the same node
  3. Dedicated ¼ of the available CPU cores for the management of each GPU card reserved.

Thus your typical GPU launcher would match the AI/DL launcher example:

#!/bin/bash -l
### Request one GPU tasks for 4 hours - dedicate 1/4 of available cores for its management
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 7
#SBATCH -G 1
#SBATCH --time=04:00:00
#SBATCH -p gpu

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load numlib/cuDNN   # Example with cuDNN

export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK # Propagate Slurm 'cpus-per-task' to srun
[...]

You can quickly access a GPU node for interactive jobs using si-gpu.


Last update: March 4, 2024