Phylo2021

Phylogenetic reconstruction workshop, Tartu, 2021

This project is maintained by Mycology-Microbiology-Center

3. Phylogenetic tree building

If you installed the software in a separate environment with conda, you should activate it prior the analysis:

conda activate PHYLO

Input data

If you do not have your own data, please download example files, which include:

You can download the files directly on a cluster with wget using the following syntax:

wget https://path/to/file

RaxMl-NG

For maximum likelihood-based phylogenetic analysis we will use RaxMl-NG.

First, download the required input files (alignment in fasta format and file with partitions):

wget https://raw.githubusercontent.com/Mycology-Microbiology-Center/Phylo2021/main/data/alignment.fas
wget https://raw.githubusercontent.com/Mycology-Microbiology-Center/Phylo2021/main/data/partitions.txt

Then, lest’s create a script (raxml.sh) that will perform the following steps:

cat > raxml.sh <<'EOT'
#!/bin/bash

  ## Input data:
  # $1 = input fasta, e.g. alignment.fas
  # $2 = part model,  e.g. partitions.txt
  # $3 = prefix (T1)
  # $4 = prefix (T2)
  # $5 = prefix (T3)
  # $6 = prefix (T4)

echo "Start analysis for " "$1" " + " "$2"

# Sanity check of alignment
raxml-ng --check --msa "$1" --model "$2"

# ML search for best topology among 10 random + 10 parsimony trees
raxml-ng --msa "$1" --model "$2" --prefix "$3" --threads 4 --seed 123 --brlen scaled 

# Make 1000 standard bootstraps
raxml-ng --bootstrap --msa "$1" --model "$2" --prefix "$4" --seed 123 --threads 2 --bs-trees 1000

# Check BS convergence
raxml-ng --bsconverge --bs-trees "$4".raxml.bootstraps --prefix "$5" --seed 123 --threads 4 --bs-cutoff 0.01

# Calculate and apply branch support to the best tree found in round T2 ("$4")
raxml-ng --support --tree "$3".raxml.bestTree --bs-trees "$4".raxml.bootstraps --prefix "$6" --threads 4

echo "RaxMl analysis finished: " "$1" " + " "$2"
EOT

Please consider this script as a simplified example.
Here we use positional arguments (e.g., $1) for input data and parameters.
To keep the things simple, we also omit some sanity checks of the input data and do not trace the execution status of each command (if something is wrong with the data, there is a chance that commands could terminate with an error!).

To schedule the job use:

sbatch \
  --job-name=raxml \
  --ntasks-per-node=4 --nodes=1 --mem=2G \
  --time=02:00:00 \
  raxml.sh alignment.fas partitions.txt T1 T2 T3 T4

MrBayes

For Bayesian inference of phylogenetic trees we will use MrBayes program.

Download input data(alignment in nexus format):

wget https://raw.githubusercontent.com/Mycology-Microbiology-Center/Phylo2021/main/data/alignment_mrbayes.nex

Create a batch script script (mrbayes.sh) with a command to execute MrBayes:

cat > mrbayes.sh <<'EOT'
#!/bin/bash
#SBATCH --job-name=mrbayes
#SBATCH --ntasks=1
#SBATCH --mem=500M
#SBATCH --time=00:30:00

echo "Starting MrBayes analysis"

mb alignment_mrbayes.nex > mrbayes_log.txt

echo "Analysis finished"
EOT

Please note, that some example-specific values (MrBayes block) are hard-coded in nexus file. If you use your own data, adjust the file accordingly.

Schedule the job:

sbatch mrbayes.sh

Resulting trees

When the jobs finish, you should obtain two trees (ML-based and Bayesian).
If something haven’t worked for you, please download pre-computed trees:

We are ready to proceed to the next step - 4. Visualization of phylogenetic trees: preparation.

Home