The root of the animal phylogeny
The analyses presented here further explore the datasets presented by Ryan et al. 2013. In that study, phylobayes was used only with the cat model (which allows equilibrium frequencies to vary across sites) and not the more computationally intensive cat+gtr model (which also allows exchange rates to vary across sites). Phylobayes analyses of the genome datasets converged under the cat model, but phylobayes analyses of the EST datasets did not even after months of run time.
Since those published analyses were initiated, a new MPI enabled version of phylobayes was released. This makes it possible to more effectively implement large computationally-intensive phylobayes analyses on computer clusters. Here we present the application of PhyloBayes-MPI to the molecular sequence matrices presented by Ryan et al. 2013, as well as several other relevant analyses.
The source code for this document, the data matrices, shell scripts used to implement the analyses, and the result files are all available at https://bitbucket.org/caseywdunn/mnemiopsis_trees_2014/src.
The findings of the new analyses presented here include:
The cat+gtr phylobayes analyses of the genome matrices and the est matrices show no sign of converging (
maxdiffis 1), but in all cases that have been evaluated the rooting of the preliminary cat+gtr trees is congruent with the rooting of the corresponding trees inferred under the cat model. This provides no indication of incongruence between the trees recovered under the cat model and the more complex cat+gtr model.
The rooting of genome analyses is sensitive to both model selection (cat vs. cat+gtr) and outgroup selection. The novel Ctenophora+Porifera clade obtained in some cat and cat+gtr analyses of the genome matrices is strongly rejected by the gene content data, according to the SOWH test.
Analyses of a reduced taxon Opisthokonta EST matrix do converge under the cat model. This analysis, like the non-convergent cat+gtr analyses of the full EST matrices, place Ctenophora as the sister group to all other animals. This result is sensitive to outgroup sampling, and the equivalent Holozoa matrix places Porifera as the sister group to all other animals.
Gene content analyses
The cat analyses of the genome sequence datasets recovered a clade comprised of Ctenophora+Porifera (Table 1, Ryan et al. 2013). The support for this clade was sensitive to outgroup sampling. Here we test whether this novel relationship is incongruent with the gene content analyses using the SOWH test, as implemented by the tool SOWHAT. This test strongly reject Ctenophora+Porifera, with a p-value of 0. The test results are in the folder
EST datasets, cat+gtr model
In order to get a sense of the most computationally intensive analyses first, cat+gtr (the most complex model available in phylobayes) was first applied to the EST datasets (the largest matrices). These analyses are in the folder
est_pb_catgtr/. There are four matrices, which differ in which out groups to Metazoa are included. These matrices are Opisthokonta, Holozoa, Choanimalia, and Animalia. Two chains were run per matrix. Each chain was run for seven days on 40-48 cores.
None of the analyses converged in this time (all had a maxdiff of 1). The results of these preliminary runs are summarized below.
This analysis does not include any outgroup taxa, so it cannot be used to assess rooting. Most relationships within Metazoa are consistent across the sampled trees, receiving frequencies of 100%. A notable exception is Xenacoelomorpha, which is unstable.
Ctenophora is recovered as the sister group to all other animals, but with low support. The posterior probability of a clade comprized of all animals except Ctenophora is 67%. Most other relationships are consistent across the sampled trees, receiving frequencies of 100%. Again, Xenacoelomorpha taxa are unstable.
Ctenophora is recovered as the sister group to all other animals in 100% of the sampled trees. Again, Xenacoelomorpha taxa are unstable.
Ctenophora is recovered as the sister group to all other animals in 100% of the sampled trees. Xenacoelomorpha is polyphyletic.
Reduced taxon EST datasets, cat model
The cat analyses presented by Ryan et al. of the EST sequence dataset did not converge after months of run time. We therefore created reduced-taxon EST sequence datasets. The following taxa were removed from the EST matrices:
- Acropora palmata
These analyses, including the modified matrices, are in the folder
This analysis did converge (maxdiff=0.0700444). Cnidaria and Bilateria form a clade with posterior probability of 100%, but there is not significant support for the relationship of Porifera, Placozoa, and Ctenophora to this clade.
This analysis did converge (maxdiff=0.0957494). It places Porifera as the sister group to all other animals with a posterior probability of 97%.
This analysis did converge (maxdiff=0.0732276). It places Ctenophora as the sister group to all other animals with a posterior probability of 98%.
Genome datasets, cat+gtr model
Here we check to see of the clade comprised of Ctenophora+Porifera recovered in the cat analyses of the genome sequence datasets (Table 1, Ryan et al. 2013) is also recovered with the more complex cat+gtr model. These analyses are in the folder
genome_pb_catgtr/. The Animalia matrix was not analyzed, since it has no outgroup taxa and therefore cannot inform rooting.
These runs are summarized below.
This analysis had acceptable convergence (maxdiff=0.144404). The root of the animal tree is unresolved - a clade comprised of all metazoans except the sponge has only 61% posterior probability.
This analysis did not converged (maxdiff = 1). Ctenophora+Porifera is recovered in 100% of the sampled trees.
This analysis did not converged (maxdiff = 1). Ctenophora+Porifera is recovered, but with only 50% posterior probability. This reflects the lack of convergence between chains. 100% of posterior trees in Chain 1 recovered Ctenophora+Porifera, while 100% of posterior trees in Chain 2 recovered Ctenophora as the sister group to all other animals.