Iris Compute Nodes¶

Iris is a cluster of x86-64 Intel-based compute nodes. More precisely, Iris consists of 196 computational nodes named iris-[001-196] and features 3 types of computing resources:

168 "regular" nodes, Dual Intel Xeon Broadwell or Skylake CPU (28 cores), 128 GB of RAM
24 "gpu" nodes, Dual Intel Xeon Skylake CPU (28 cores), 4 Nvidia Tesla V100 SXM2 GPU accelerators (16 or 32 GB), 768 GB RAM
4 "bigmem" nodes: Quad-Intel Xeon Skylake CPU (112 cores), 3072 GB RAM

Hostname (#Nodes)	Node type	Processor	RAM
`iris-[001-108]` (108)	Regular Broadwell	2 Xeon E5-2680v4 @ 2.4GHz [14c/120W]	128 GB
`iris-[109-168]` (60)	Regular Skylake	2 Xeon Gold 6132 @ 2.6GHz [14c/140W]	128 GB
`iris-[169-186]` (18)	Multi-GPU Skylake	2 Xeon Gold 6132 @ 2.6GHz [14c/140W] 4x Tesla V100 SXM2 16G	768 GB
`iris-[191-196]` (6)	Multi-GPU Skylake	2 Xeon Gold 6132 @ 2.6GHz [14c/140W] 4x Tesla V100 SXM2 32G	768 GB
`iris-[187-190]` (4)	Large Memory Skylake	4 Xeon Platinum 8180M @ 2.5GHz [28c/205W]	3072 GB

Processors Performance¶

Each Iris node rely on an Intel x86_64 processor architecture with the following performance:

Processor Model	#core	TDP(*)	CPU Freq. (AVX-512 T.Freq.)	$R_\text{peak}$ [TFlops]	$R_\text{max}$ [TFlops]
Xeon E5-2680v4 (Broadwell)	14	120W	2.4GHz (n/a)	0.538 TF	0.46 TF
Xeon Gold 6132 (Skylake)	14	140W	2.6GHz (2.3GHz)	1.03 TF	0.88 TF
Xeon Platinum 8180M (Skylake)	28	205W	2.5GHz (2.3GHz)	2.06 TF	1.75 TF

(*) The Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload.

Theoretical

$R_\text{peak}$ vs. Maximum

$R_\text{max}$ Performance for Intel Broadwell/Skylake

The reported $R_\text{peak}$ performance is computed as follows for the above processors:

The Broadwell processors carry on 16 Double Precision (DP) ops/cycle and support AVX2/FMA3.
The selected Skylake Gold processors have two AVX512 units, thus they are capable of performing 32 DP ops/cycle YET only upon AVX-512 Turbo Frequency (i.e., the maximum all-core frequency in turbo mode) in place of the base non-AVX core frequency. The reported values are extracted from the Reference Intel Specification documentation.

Then $R_\text{peak} = ops/cycle \times Freq. \times \#Cores$ with the appropriate frequency (2.3 GHz instead of 2.6 for our Skylake processors).

With regards the estimation of the Maximum Performance $R_\text{max}$ , an efficiency factor of 85% is applied. It is computed from the expected performance runs during the HPL benchmark workload.

Accelerators Performance¶

Iris is equipped with 96 NVIDIA Tesla V100-SXM2 GPU Accelerators with 16 or 32 GB of GPU memory, interconnected within each node through NVLink which provides higher bandwidth and improved scalability for multi-GPU system configurations.

NVidia GPU Model	#CUDA core	#Tensor core	Power	Interconnect Bandwidth	GPU Memory	$R_\text{peak}$ [TFlops]
V100-SXM2	5120	640	300W	300 GB/s	16GB	7.8 TF
V100-SXM2	5120	640	300W	300 GB/s	32GB	7.8 TF

Regular Dual-CPU Nodes¶

These nodes are packaged within Dell PowerEdge C6300 chassis, each hosting 4 PowerEdge C6320 blade servers.

Broadwell Compute Nodes¶

Iris comprises 108 Dell C6320 "regular" compute nodes iris-001-108 relying on Broadwell Xeon processor generation, totalling 3024 computing cores.

Each node is configured as follows:
- 2 Intel Xeon E5-2680v4 @ 2.4GHz [14c/120W]
- RAM: 128 GB DDR4 2400MT/s (4x16 GB DIMMs per socket, 8 DIMMs per node)
- SSD 120GB
- InfiniBand (IB) EDR ConnectX-4 Single Port
- Theoretical Peak Performance per Node: $R_\text{peak}$ 1.075 TF (see processor performance)

Reserving a Broadwell node

If you want to specifically reserve a broadwell node (iris-[001-108]), you should use the feature -C broadwell on the batch partition: {sbatch|srun|salloc} -p batch -C broadwell [...]

Skylake Compute Nodes¶

Iris also features 60 Dell C6320 "regular" compute nodes iris-109-168 relying on Skylake Xeon processor generation, totalling 1680 computing cores.

Each node is configured as follows:
- 2 Intel Xeon Gold 6132 @ 2.6GHz [14c/140W]
- RAM: 128 GB DDR4 2400MT/s (4x16 GB DIMMs per socket, 8 DIMMs per node)
- SSD 120GB
- InfiniBand (IB) EDR ConnectX-4 Single Port
- Theoretical Peak Performance per Node: $R_\text{peak}$ 2.061 TF (see processor performance)

Reserving a Regular Skylake node

If you want to specifically reserve a regular skylake node (iris-[109-168]), you should use the feature -C skylake on the batch partition: {sbatch|srun|salloc} -p batch -C skylake [...]

Multi-GPU Compute Nodes¶

Iris includes 24 Dell PowerEdge C4140 "gpu" compute nodes embedding on total 96 NVIDIA Tesla V100-SXM2 GPU Accelerators.

Each node is configured as follows:
- 2 Intel Xeon Gold 6132 @ 2.6GHz [14c/140W]
- RAM: 768 GB DDR4 2666MT/s (12x 32 GB DIMMs per socket, 24 DIMMs per node)
- 1 Dell NVMe 1.6TB
- InfiniBand (IB) EDR ConnectX-4 Dual Port
- 4x NVIDIA Tesla V100-SXM2 GPU Accelerators over NVLink
  - iris-[169-186] feature 16G GPU memory - use -C volta as slurm feature
  - iris-[191-196] feature 32G GPU memory - use -C volta32 as slurm feature
- Theoretical Peak Performance per Node: $R_\text{peak}$ 33.26 TF (see processor performance and accelerators performance)

Reserving a GPU node

Multi-GPU Compute Nodes can be reserved using the gpu partition. Use the -G [<type>:]<number> to specify the total number of GPUs required for the job

# Interactive job on 1 GPU nodes with 1 GPU
si-gpu -G 1
nvidia-smi      # Check allocated GPU

# Interactive job with 4 GPUs on the same node, one task per gpu, 7 cores per task
si-gpu -N 1 -G 4 --ntasks-per-node 4 --ntasks-per-socket 2 -c 7

# Job submission on 2 nodes, 4 GPUs/node and 4 tasks/node:
sbatch -p gpu -N 2 -G 4 --ntasks-per-node 4 --ntasks-per-socket 2 -c 7 launcher.sh

Do NOT reserve a GPU node if you don't need a GPU!

Multi-GPU nodes are scarce (and very expansive) resources and should be dedicated to GPU-enabled workflows.

16 GB vs. 32 GB Onboard GPU Memory

Compute nodes with Nvidia V100-SMX2 16GB accelerators are registrered with the -C volta feature.
- it corresponds to the 18 Multi-GPU compute nodes iris-[169-186]
If you want to reserve GPUs with more memory (i.e. 32GB on-board HBM2), you should use -C volta32
- you would then end on one of the 6 Multi-GPU compute nodes iris-[191-196]

Large-Memory Compute Nodes¶

Iris holds 4 Dell PowerEdge R840 Large-Memory ("bibmem") compute nodes iris-[187-190], totalling 448 computing cores.

Each node is configured as follows:
- 4 Xeon Platinum 8180M @ 2.5GHz [28c/205W]
- RAM: 3072 GB DDR4 2666MT/s (12x64 GB DIMMs per socket, 48 DIMMs per node)
- 1 Dell NVMe 1.6TB
- InfiniBand (IB) EDR ConnectX-4 Dual Port
- Theoretical Peak Performance per Node: $R_\text{peak}$ 8.24 TF (see processor performance)

Reserving a Large-Memory node

These nodes can be reserved using the bigmem partition: {sbatch|srun|salloc} -p bigmem [...]

DO NOT use bigmem nodes...

... Unless you know what you are doing. We have too few large-memory compute nodes so kindly keep them for workloads that truly need these kind of expansive resources.

In short: carefully check your workflow and memory usage before considering using these node!
- use seff <jobid> or sacct -j <jobid> [...] for instance