The compute nodes are connected using high-speed InfiniBand FDR (56 Gbit/s) equipment from Mellanox. To reduce cost while maintaining good inter-node speed, the InfiniBand switches are connected using a 3D torus (a 3x3x4 3D torus) as shown in the figure below.
Each box in the figure, e.g., s51 (1,2,1), corresponds to an InfiniBand switch.
- The slim nodes are connected to the 28 "blue" switches (16 nodes/switch)
- The gpu nodes are connected to the 4 green switches (18 nodes/switch)
- The fat nodes are connected to the 4 yellow switches (16 nodes/switch).
Each switch is connected to its its 6 neighbour switches, e.g., s24 (2,1,4) is connected to:
- x-direction: s14 (1,1,4) and s72 (3,1,4)
- y-direction: s94 (2,3,4) and s64 (2,2,4)
- z-direction: s23 (2,1,3) and s21 (2,1,1)
The storage nodes are connected to all the switches in the s2x, s6x and s9x columns.
If possible, the job scheduler (Slurm) packs jobs such that the nodes used are as closely connected and are on as few switches as possible. If you use at most 16 nodes (slim and fat nodes) or 18 nodes (GPU nodes), it is possible to schedule the job such that all nodes are on the same switch. For more information, look at our documentation on Slurm.