Skip to content

Aion Overview

Aion is a Atos/Bull/AMD supercomputer which consists of 354 compute nodes, totaling 45312 compute cores and 90624 GB RAM, with a peak performance of about 1,88 PetaFLOP/s.

All nodes are interconnected through a Fast InfiniBand (IB) HDR100 network1, configured over a Fat-Tree Topology (blocking factor 1:2). Aion nodes are equipped with AMD Epyc ROME 7H12 processors.

Two global high-performance clustered file systems are available on all ULHPC computational systems: one based on GPFS/SpectrumScale, one on Lustre.

Aion Compute Aion Interconnect Global Storage

The cluster runs a Red Hat Linux operating system. The ULHPC Team supplies on all clusters a large variety of HPC utilities, scientific applications and programming libraries to its user community. The user software environment is generated using Easybuild (EB) and is made available as environment modules from the compute nodes only.

Slurm is the Resource and Job Management Systems (RJMS) which provides computing resources allocations and job execution. For more information: see ULHPC slurm docs.

Cluster Organization

Data Center Configuration

The Aion cluster is based on a cell made of 4 BullSequana XH2000 adjacent racks installed in the CDC (Centre de Calcul) data center of the University within one of the DLC-enabled server room (CDC S-02-004) adjacent to the room hosting the Iris cluster and the global storage.

Each rack has the following dimensions: HxWxD (mm) = 2030x750x1270 (Depth is 1350mm with aesthetic doors). The full solution with 4 racks (total dimension: dimensions: HxWxD (mm) = 2030x3000x1270) with the following characteristics:

Rack 1 Rack 2 Rack 3 Rack 4 TOTAL
Weight [kg] 1872,4 1830,2 1830,2 1824,2 7357 kg
#X2410 Rome Blade 30 29 29 30 118
#Compute Nodes 90 87 87 90 354
#Compute Cores 11520 11136 11136 11520 45312
R_\text{peak} [TFlops] 479,23 TF 463,25 TF 463,25 TF 479,23 TF 1884.96 TF

For more details: BullSequana XH2000 SpecSheet (PDF)

Cooling

The BullSequana XH2000 is a fan less innovative cooling solution which is ultra-energy-efficient (targeting a PUE very close to 1) using an enhanced version of the Bull Direct Liquid Cooling (DLC) technology. A separate hot-water circuit minimizes the total energy consumption of a system. For more information: see [Direct] Liquid Cooling.

The illustration on the right shows an exploded view of a compute blade with the cold plate and heat spreaders. The DLC1 components in the rack are:

  • Compute nodes (CPU, Memory, Drives, GPU)
  • High Speed Interconnect: HDR
  • Management network: Ethernet management switches
  • Power Supply Unit: DLC shelves

The cooling area in the rack is composed of:

  • 3 Hydraulic chassis (HYCs) for 2+1 redundancy at the bottom of the cabinet, 10.5U height.
  • Each HYCs dissipates at a maximum of 240W in the air.
  • A primary manifold system connects the University hot-water loop to the HYCs primary water inlets
  • A secondary manifold system connects HYCs outlets to each blade in the compute cabinet

Login/Access servers

  • Aion has 2 access servers (256 GB of memory each, general access) access[1-2]
  • Each login node has two sockets, each socket is populated with an AMD EPYC 7452 processor (2.2 GHz, 32 cores)

Access servers are not meant for compute!

  • The module command is not available on the access servers, only on the compute nodes
  • you MUST NOT run any computing process on the access servers.

Rack Cabinets

The Aion cluster (management compute and interconnect) is installed across the two adjacent server rooms in the premises of the Centre de Calcul (CDC), in the CDC-S02-005 server room.

Server Room Rack ID Purpose Type Description
CDC-S02-005 D02 Network Interconnect equipment
CDC-S02-005 A04 Management Management servers, Interconnect
CDC-S02-004 A01 Compute regular aion-[0001-0084,0319-0324], interconnect
CDC-S02-004 A02 Compute regular aion-[0085-0162,0325-0333], interconnect
CDC-S02-004 A03 Compute regular aion-[0163-0240,0334-0342], interconnect
CDC-S02-004 A04 Compute regular aion-[0241-0318,0343-0354], interconnect

In addition, the global storage equipment (GPFS/SpectrumScale and Lustre, common to both Iris and Aion clusters) is installed in another row of cabinets of the same server room.


  1. All DLC components are built on a cold plate which cools all components by direct contact, except DIMMS for which custom heat spreaders evacuate the heat to the cold plate. 


Last update: December 2, 2024