Job scheduling on our HPC system is done using Slurm. Slurm (Simple Linux Utility for Resource Management) is a free and open-source job scheduler used by many of the world’s supercomputers and computer clusters.
The queue system is configured with three distinct partitions, corresponding to the three different types of nodes available:
The priority on the queue is based on the waiting time of jobs and a fairshare based on allocated resources. This ensures that we both prioritise that jobs are run in the order they are added to the queue (FIFO) while also ensuring that no user group can occupy an entire node partition for longer periods (fairshare).
Each month the fairshare is updated such that unused node hours in previous months does not count in the new month, i.e., even if a group does not use any node hours for two months, they only get a fairshare in the third month corresponding to the fraction of their time allocated in that month.
Jobs are backfilled, i.e., small jobs which can be fitted in earlier in the schedule without delaying later large jobs, are allowed to run.
To guarantee a fair turnaround of jobs and projects and to enable more effective use of HPC resources the wall-time of jobs is limited to at most 24 hours.