HPC basics
For demonstration purposes, we will use the Rocket Cluster of the University of Tartu.
To connect to the head-node of the HPC cluster using the secure shell protocol, run:
ssh USER@SERVER
substitute USER
with your login ID and SERVER
with a hostname (IP or a domain name).
E.g., ssh koljalg@rocket.hpc.ut.ee
.
To copy a file or multiple files to the HPC cluster, use:
scp yourfile koljalg@rocket.hpc.ut.ee:~/yourfile # single file
scp file1 file2 koljalg@rocket.hpc.ut.ee:~/ # multiple files
To copy file from the HPC cluster (e.g., yourfile
from home directory on HPC to home directory on your computer), use:
scp koljalg@rocket.hpc.ut.ee:~/yourfile ~/yourfile
If you have large files (or a large number of files), it’s better to use rsync
program for file transfer, e.g.
rsync -avz Documents/* koljalg@rocket.hpc.ut.ee:~/all/
To end your session on the HPC cluster, run:
exit
Setup working environment on HPC cluster
In general, one needs admin rights to install the software on HPC clusters. However, users may install software into their home directory where they have write permissions. To make life easier, you may use Conda
- a package manager which helps you find and install the software and its dependencies.
To install Miniconda, run the following code:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda
~/miniconda/bin/conda init bash
source ~/.bashrc
conda update --all --yes -c bioconda -c conda-forge
conda install --yes -c conda-forge mamba
To install the software (e.g., seqkit
program), run:
mamba install -c bioconda seqkit
Conda environments
If the software you wish to use could not be installed in the base (default) environment due to the conflict of versions, or you want to use a specific version of the program, or just want to keep it independent, you may create a separate environment with:
mamba create --name VSEARCHENV -c bioconda -c conda-forge vsearch=2.21.1 blast=2.13.0
conda activate VSEARCHENV # swith to the new environment we've created
Verify which software versions are installed:
vsearch --version
blastn -version
To switch to the base environment, run:
conda deactivate
Module system
Alternatively, if the software you wish to use is pre-installed on the HPC cluster, you may load it as an environment module.
To list all available modules, use module avail
command (scroll the list with space
button, press q
to quit).
To search for a particular module, use e.g. module -r spider '.*singularity.*'
.
If the required software was found, you need to load the module, e.g.:
module load any/singularity/3.7.3
Scheduling jobs on the HPC cluster
The Slurm Workload Manager, a.k.a. Simple Linux Utility for Resource Management (SLURM), is used to share the HPC resources between users.
Please note that users log in from their computers to the cluster head node.
Do not run the analysis on the head node!
You should use it only to schedule the tasks that would be distributed on the cluster nodes.
To run the program on the HPC cluster you should:
- Prepare a batch script with directives to SLURM about the number of CPUs*, amount of RAM*, and time duration requested for the job, along with commands which perform the desired calculations;
- Submit the script to the batch queue. SLURM will evaluate the task’s priority and start executing the job when it reaches the front of the queue.
When the job finishes, you may retrieve the output files.
* Unless specified otherwise, on the Rocket cluster, all jobs will be allocated 1 node with 1 CPU core and 2 GB of memory.
Batch script
Here is a basic batch script which that contains a minimal number of SLURM options:
#!/bin/bash -l
#SBATCH --job-name=my_job
#SBATCH --cpus-per-task=4
#SBATCH --nodes=1
#SBATCH --mem=10G
#SBATCH --partition amd
#SBATCH --time=48:00:00
## If needed, you may load the required modules here
# module load X
## Run your code
some_program -i input.data -o output_1.data --threads 4
some_script.sh output_1.data > output_2.data
echo "Done" > output.log
The syntax for the SLURM directive in a script is #SBATCH <flag>
, where <flag>
could be:
--job-name
, the name of the job;--cpus-per-task
, the number of CPUs each task should have (e.g., 4 cores);--nodes
, requested number of nodes (each node could have multiple CPUs);--mem
, requested amount of RAM (e.g., 10 gigabytes);--partition
, the partition on which the job shall run--time
, the requested time for the job (e.g., 48 hours).
To submit a job, save the code above to a file (e.g., my_job.sh
) and run:
sbatch my_job.sh
Scheduling a task directly from the command line
If the command you wish to run is relatively simple, you may run it without a batch script, but in that case, you should provide SLURM directives as arguments to the sbatch
command:
sbatch \
--job-name=my_job \
--ntasks-per-node=4 --nodes=1 --mem=10G -p amd \
--time=48:00:00 \
some_script.sh input.data
Job management
When the job is submitted, you may monitor the queue and see the status of your running tasks:
squeue -u $USER
The most common job state codes (column ST
) are:
PD
= PENDING
R
= RUNNING
S
= SUSPENDED
To cancel the job, use:
scancel <JOBID> # by job ID (e.g., where <JOBID> is 31727880 - see the column JOBID in "squeue" output)
scancel --name my_job # one or more jobs by name
scancel -u $USER # all jobs for a current user