Phylogenetic reconstruction workshop, Tartu, 2021
This project is maintained by Mycology-Microbiology-Center
If you installed the software in a separate environment with conda
, you should activate it prior the analysis:
conda activate PHYLO
If you do not have your own data, please download example files, which include:
alignment.fas
, aligned sequences in FASTA format;partitions.txt
, alignment coordinates of different parts of the gene (will be used as RaxMl
input).alignment_mrbayes.nex
, aligned sequences in NEXUS format with additional MrBayes
block containing commands for the analysis.You can download the files directly on a cluster with wget
using the following syntax:
wget https://path/to/file
For maximum likelihood-based phylogenetic analysis we will use RaxMl-NG
.
First, download the required input files (alignment in fasta format and file with partitions):
wget https://raw.githubusercontent.com/Mycology-Microbiology-Center/Phylo2021/main/data/alignment.fas
wget https://raw.githubusercontent.com/Mycology-Microbiology-Center/Phylo2021/main/data/partitions.txt
Then, lest’s create a script (raxml.sh
) that will perform the following steps:
cat > raxml.sh <<'EOT'
#!/bin/bash
## Input data:
# $1 = input fasta, e.g. alignment.fas
# $2 = part model, e.g. partitions.txt
# $3 = prefix (T1)
# $4 = prefix (T2)
# $5 = prefix (T3)
# $6 = prefix (T4)
echo "Start analysis for " "$1" " + " "$2"
# Sanity check of alignment
raxml-ng --check --msa "$1" --model "$2"
# ML search for best topology among 10 random + 10 parsimony trees
raxml-ng --msa "$1" --model "$2" --prefix "$3" --threads 4 --seed 123 --brlen scaled
# Make 1000 standard bootstraps
raxml-ng --bootstrap --msa "$1" --model "$2" --prefix "$4" --seed 123 --threads 2 --bs-trees 1000
# Check BS convergence
raxml-ng --bsconverge --bs-trees "$4".raxml.bootstraps --prefix "$5" --seed 123 --threads 4 --bs-cutoff 0.01
# Calculate and apply branch support to the best tree found in round T2 ("$4")
raxml-ng --support --tree "$3".raxml.bestTree --bs-trees "$4".raxml.bootstraps --prefix "$6" --threads 4
echo "RaxMl analysis finished: " "$1" " + " "$2"
EOT
Please consider this script as a simplified example.
Here we use positional arguments (e.g., $1
) for input data and parameters.
To keep the things simple, we also omit some sanity checks of the input data and do not trace the execution status of each command (if something is wrong with the data, there is a chance that commands could terminate with an error!).
To schedule the job use:
sbatch \
--job-name=raxml \
--ntasks-per-node=4 --nodes=1 --mem=2G \
--time=02:00:00 \
raxml.sh alignment.fas partitions.txt T1 T2 T3 T4
For Bayesian inference of phylogenetic trees we will use MrBayes
program.
Download input data(alignment in nexus format):
wget https://raw.githubusercontent.com/Mycology-Microbiology-Center/Phylo2021/main/data/alignment_mrbayes.nex
Create a batch script script (mrbayes.sh
) with a command to execute MrBayes
:
cat > mrbayes.sh <<'EOT'
#!/bin/bash
#SBATCH --job-name=mrbayes
#SBATCH --ntasks=1
#SBATCH --mem=500M
#SBATCH --time=00:30:00
echo "Starting MrBayes analysis"
mb alignment_mrbayes.nex > mrbayes_log.txt
echo "Analysis finished"
EOT
Please note, that some example-specific values (MrBayes block) are hard-coded in nexus file. If you use your own data, adjust the file accordingly.
Schedule the job:
sbatch mrbayes.sh
When the jobs finish, you should obtain two trees (ML-based and Bayesian).
If something haven’t worked for you, please download pre-computed trees:
We are ready to proceed to the next step - 4. Visualization of phylogenetic trees: preparation.