# Aion Overview¶

Aion is a Atos/Bull/AMD supercomputer which consists of 318 compute nodes, totaling 40704 compute cores and 81408 GB RAM, with a peak performance of about 1,70 PetaFLOP/s.

All nodes are interconnected through a Fast InfiniBand (IB) HDR100 network1, configured over a Fat-Tree Topology (blocking factor 1:2). Aion nodes are equipped with AMD Epyc ROME 7H12 processors.

Two global high-performance clustered file systems are available on all ULHPC computational systems: one based on GPFS/SpectrumScale, one on Lustre.

The cluster runs a Red Hat Linux operating system. The ULHPC Team supplies on all clusters a large variety of HPC utilities, scientific applications and programming libraries to its user community. The user software environment is generated using Easybuild (EB) and is made available as environment modules from the compute nodes only.

Slurm is the Resource and Job Management Systems (RJMS) which provides computing resources allocations and job execution. For more information: see ULHPC slurm docs.

## Cluster Organization¶

### Data Center Configuration¶

The Aion cluster is based on a cell made of 4 BullSequana XH2000 adjacent racks installed in the CDC (Centre de Calcul) data center of the University within one of the DLC-enabled server room (CDC S-02-004) adjacent to the room hosting the Iris cluster and the global storage.

Each rack has the following dimensions: HxWxD (mm) = 2030x750x1270 (Depth is 1350mm with aesthetic doors). The full solution with 4 racks (total dimension: dimensions: HxWxD (mm) = 2030x3000x1270) with the following characteristics:

Rack 1 Rack 2 Rack 3 Rack 4 TOTAL
Weight [kg] 1872,4 1830,2 1830,2 1824,2 7357 kg
#X2410 Rome Blade 28 26 26 26 106
#Compute Nodes 84 78 78 78 318
#Compute Cores 10752 9984 9984 9984 40704
$R_\text{peak}$ [TFlops] 447,28 TF 415,33 TF 415,33 TF 415,33 TF 1693.29 TF

For more details:

### Cooling¶

The BullSequana XH2000 is a fan less innovative cooling solution which is ultra-energy-efficient (targeting a PUE very close to 1) using an enhanced version of the Bull Direct Liquid Cooling (DLC) technology. A separate hot-water circuit minimizes the total energy consumption of a system. For more information: see [Direct] Liquid Cooling.

The illustration on the right shows an exploded view of a compute blade with the cold plate and heat spreaders. The DLC1 components in the rack are:

• Compute nodes (CPU, Memory, Drives, GPU)
• High Speed Interconnect: HDR
• Management network: Ethernet management switches
• Power Supply Unit: DLC shelves

The cooling area in the rack is composed of:

• 3 Hydraulic chassis (HYCs) for 2+1 redundancy at the bottom of the cabinet, 10.5U height.
• Each HYCs dissipates at a maximum of 240W in the air.
• A primary manifold system connects the University hot-water loop to the HYCs primary water inlets
• A secondary manifold system connects HYCs outlets to each blade in the compute cabinet

• Aion has 2 access servers (256 GB of memory each, general access) access[1-2]
• Each login node has two sockets, each socket is populated with an AMD EPYC 7452 processor (2.2 GHz, 32 cores)

Access servers are not meant for compute!

• The module command is not available on the access servers, only on the compute nodes
• you MUST NOT run any computing process on the access servers.

## Rack Cabinets¶

The Aion cluster (management compute and interconnect) is installed across the two adjacent server rooms in the premises of the Centre de Calcul (CDC), in the CDC-S02-005 server room.

Server Room Rack ID Purpose Type Description
CDC-S02-005 D02 Network Interconnect equipment
CDC-S02-005 A04 Management Management servers, Interconnect
CDC-S02-004 A01 Compute regular aion-[0001-0084], interconnect
CDC-S02-004 A02 Compute regular aion-[0085-0162], interconnect
CDC-S02-004 A03 Compute regular aion-[0163-0240], interconnect
CDC-S02-004 A04 Compute regular aion-[0241-0318], interconnect

In addition, the global storage equipment (GPFS/SpectrumScale and Lustre, common to both Iris and Aion clusters) is installed in another row of cabinets of the same server room.

1. All DLC components are built on a cold plate which cools all components by direct contact, except DIMMS for which custom heat spreaders evacuate the heat to the cold plate.

Last update: October 6, 2021