Aion Compute Nodes¶
Aion is a cluster of x86-64 AMD-based compute nodes.
More precisely, Aion consists of 354 "regular" computational nodes named aion-[0001-0354]
as follows:
Hostname (#Nodes) | #cores | type | Processors | RAM | R_\text{peak} [TFlops] |
---|---|---|---|---|---|
aion-[0001-0354] (354) |
45312 | Regular Epyc | 2 AMD Epyc ROME 7H12 @ 2.6 GHz [64c/280W] | 256 GB | 5.32 TF |
Aion compute nodes compute nodes MUST be seen as 8 (virtual) processors of 16 cores each, even if physically the nodes are hosting 2 physical sockets of AMD Epyc ROME 7H12 processors having 64 cores each (total: 128 cores per node).
- As will be highlighted in the slurm resource allocation, that means that targetting a full node utilization assumes that you use the following format attributes to your jobs:
{sbatch|srun|si|salloc} [-N <N>] --ntasks-per-node <8n> --ntasks-per-socket <n> -c <thread>
where- you want to ensure that
<n>
\times<thread>
= 16 on aion - this will bring a total of
<N>
\times 8\times<n>
tasks, each on<thread>
threads - Ex:
-N 2 --ntasks-per-node 32 --ntasks-per-socket 4 -c 4
(Total: 64 tasks)
- you want to ensure that
Processor Performance¶
Each Aion node rely on an AMD Epyc Rome processor architecture which is binary compatible with the x86-64 architecture. Each processor has the following performance:
Processor Model | #cores | TDP(*) | CPU Freq. | R_\text{peak} [TFlops] |
R_\text{max} [TFlops] |
---|---|---|---|---|---|
AMD Epyc ROME 7H12 | 64 | 280W | 2.6 GHz | 2.66 TF | 2.13 TF |
(*) The Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload.
Theoretical R_\text{peak} vs. Maximum R_\text{max} Performance for AMD Epyc
The AMD Epyc processors carry on 16 Double Precision (DP) ops/cycle. Thus the reported R_\text{peak} performance is computed as follows: R_\text{peak} = ops/cycle \times Freq. \times \#Cores
With regards the estimation of the Maximum Performance R_\text{max}, an efficiency factor of 80% is applied. It is computed from the expected performance runs during the HPL benchmark workload.
Regular Dual-CPU Nodes¶
These nodes are packaged within BullSequana X2410 AMD compute blades.
Each blade contains 3 dual-socket AMD Rome nodes side-by-side, connected to the BullSequana XH2000 local interconnect network through HDR100 ports which is done through a mezzanine board. The BullSequana AMD blade is built upon a cold plate which cools all components by direct contact, except DIMMS for which custom heat spreaders evacuate the heat to the cold plate -- see exploded view Characteristics of each blade and associated compute nodes are depicted in the below table
BullSequana X2410 AMD blade | |
---|---|
Form Factor | 1U blade comprising 3 compute nodes side-by-side |
#Nodes per blade | 3 |
Processors per node | 2x AMD Epyc ROME 7H12 @ 2.6 GHz [64c/280W] |
Architecture | AMD SP3 Platform: 3x1 motherboard |
Memory per node | 256 GB DDR4 3200MT/s (8x16 GB DIMMs per socket, 16 DIMMs per node) |
Network (per node) | InfiniBand HDR100 single port mezzanine |
Storage (per node) | 1x 480 GB SSD |
Power supply | PSU shelves on top of XH2000 cabinet |
Cooling | Cooling by direct contact |
Physical specs. (HxWxD) | 44.45 x 600 x 540 mm |
The four compute racks of Aion (one XH2000 Cell) holds a total of 118 blades i.e., 354 AMD Epyc compute nodes, totalling 45312 computing core -- see Aion configuration.