Extension of Abacus 2.0 and Downtime in May

Abacus 2.0 Phase 2

During May 2016 Abacus 2.0 is extended with further 192 slim nodes. After the extension we have in total 584 compute nodes of which 448 nodes are slim nodes, and the remaining nodes are the original 72 gpu nodes and 64 fat nodes.

As part of the upgrade, the entire supercomputer is moved to a new data centre (also located at SDU, Campus Odense). This unfortunately means that the system will be unavailable for users during parts of May — see below for details.

Some of the resources, in particular the power transformer / electricity is shared between the old and new data centre, i.e., we unfortunately will have a few days of downtime in the preparation phase before the actual upgrade.

Upgrade Plan

Dates in the plan are unfortunately subject to change if necessary. Users of Abacus will be notified whenever necessary. See below for historic events.

  • April 26: Cooling at the new data centre needs to be tested. As the cooling system must be tested with the same power consumption as the new system, Abacus must be shut down for at most eight hours.
  • April 29: Installation of the new slim nodes in the new data centre starts.
  • First downtime (May 2 to May 13)
    • Abacus is closed down and is not accessible to users
    • Phase 1 compute nodes are moved to the new data centre and installed (May 2-13).
  • Phase 2 test period (May 14 to May 21)
    • Phase 1 nodes run as normal — now at the new data centre — users can access all parts of the system as usual
    • Phase 2 nodes are tested
  • Second downtime (May 23 to May 29)
    • This period may start slightly later depending on issues found during the phase 2 test period
    • The entire system is checked to detect performance issues, etc
    • Abacus is not accessible to users
  • Reopening of the new expanded Abacus 2.0 (June 1)

Frequently asked questions

If you have any questions about the upgrade, you are welcome to contact us at support @ deic.sdu.dk.

  • Can I access the frontend node to look at, copy data, etc during the three weeks of downtime in May?
    During most of the downtime, the system including the frontend node is not accessible to users. The frontend is connected to our GPFS storage system using an InfiniBand network, which is not up and running during most of the first downtime. During the second downtime, anything that can affect performance is not allowed to run, this includes accessing data on the frontend which use (a small fraction) of the available bandwidth, etc.

Time line

November 2015: The final contract between Lenovo/Datacon and SDU was signed for the procurement of Phase 2.

January: Building of our new data centre started in two cellar rooms previously used for storage:

February 9: The main power switch for the new data centre was installed:

During February and the beginning of March the actual data centre room was built inside as a room within a room construction:

March 30: The power cables between our main electrical panel and the new data centre was connected.

April 4-5: The main part of the indoor cooling equipment was installed, and construction of the raised floor began: