This repository contains the code that describes most analyses presented in:
Zapata F, Goetz FE, Smith SA, Howison M, Siebert S, Church SH, Sanders SM, Ames CL, McFadden CS France SC, Daly M, Collins AG, Haddock SHD, Dunn CW, Cartwright P. (2015) Phylogenomic analyses support traditional relationships within Cnidaria. PLoS One 10(10): e0139068. doi:10.1371/journal.pone.0139068. bioRxiv preprint doi:10.1101/017632.
Please see the figures folder for vector and web-ready versions of the figures and original artwork.
These scripts require Agalma and its dependencies. Agalma versions 0.4.0 and 0.5.0 were used to run the analyses.
Running the analyses
The analyses are broken into a series of scripts, which are available in the
phylogenetic-analyses/ directories. The script
master.sh within each of these directories
indicates the order that all the other scripts should be run in. The
directory also includes a series of python scripts used to generate intermediate files.
All scripts include, as comments, commands for executing the analyses via the SLURM job scheduler installed on the OSCAR cluster at Brown University. If you are running the analyses without a job scheduler, then these SLURM commands will be ignored. If you are using a job scheduler, you will need to edit these commands according to the configuration of your own system.
Is this a fully executable paper?
This manuscript is partially executable. The code explicitly describes how most analysis steps were completed but is not entirely sufficient on its own to re-execute the whole paper. There are several reasons for this:
Some basic steps, such as removing taxa from matrices and updating taxon names, were performed manually. These steps are described in the manuscript.
Most figures were prepared manually to integrate results of several different analyses.
Some third party data, eg 454 reads, were manually preprocessed prior to analysis.
The code provided here includes paths to local data files on our cluster. To rerun these analyses on another system, the data would need to be re-downloaded and the paths would need to be updated (see next section).
agalma-analyses/00-catalog.sh script we used to catalog our data for analysis points to
local data directories where we curated the new and previously-existing public data.
We provide a couple of resources to help curate data for rerunning analyses on another system:
All new data generated in this study can be downloaded directly from the GenBank sequence read archive (SRA) and cataloged in Agalma using the script
agalma-analyses/00-import.sh. Note that if
00-import.shis used to catalog all the data, the IDs for all taxa need to be updated in all other scripts.
We provide detailed information on all data included in this manuscript in the table
sra/ includes the scripts we used to prepare our data for upload to SRA.
Since the data are already available, there is no need to rerun these scripts. They are
provided as a record of how we prepared our data and as a template for others to upload
their own data.
data/ directory contains all the sequence alignments, tree sets and summary trees
resulting from our phylogenetic analyses. Please refer to
data/README.md for an
explanation of each data file.
We have deposited the assemblies we generated for this study at https://bitbucket.org/caseywdunn/cnidaria2014-assemblies/. Public data are available elsewhere (NCBI-EST data base, JGI, NHGRI, FlyBase)