1. Casey Dunn
  2. cnidaria2014

Overview

HTTPS SSH

Introduction

This repository contains the code that describes most analyses presented in:

Zapata F, Goetz FE, Smith SA, Howison M, Siebert S, Church SH, Sanders SM, Ames CL, McFadden CS France SC, Daly M, Collins AG, Haddock SHD, Dunn CW, Cartwright P. (2015) Phylogenomic analyses support traditional relationships within Cnidaria. PLoS One 10(10): e0139068. doi:10.1371/journal.pone.0139068. bioRxiv preprint doi:10.1101/017632.

Figure 4

Please see the figures folder for vector and web-ready versions of the figures and original artwork.

Dependencies

These scripts require Agalma and its dependencies. Agalma versions 0.4.0 and 0.5.0 were used to run the analyses.

Running the analyses

The analyses are broken into a series of scripts, which are available in the agalma-analyses/ and phylogenetic-analyses/ directories. The script master.sh within each of these directories indicates the order that all the other scripts should be run in. The phylogenetic-analyses/ directory also includes a series of python scripts used to generate intermediate files.

All scripts include, as comments, commands for executing the analyses via the SLURM job scheduler installed on the OSCAR cluster at Brown University. If you are running the analyses without a job scheduler, then these SLURM commands will be ignored. If you are using a job scheduler, you will need to edit these commands according to the configuration of your own system.

Is this a fully executable paper?

This manuscript is partially executable. The code explicitly describes how most analysis steps were completed but is not entirely sufficient on its own to re-execute the whole paper. There are several reasons for this:

  • Some basic steps, such as removing taxa from matrices and updating taxon names, were performed manually. These steps are described in the manuscript.

  • Most figures were prepared manually to integrate results of several different analyses.

  • Some third party data, eg 454 reads, were manually preprocessed prior to analysis.

  • The code provided here includes paths to local data files on our cluster. To rerun these analyses on another system, the data would need to be re-downloaded and the paths would need to be updated (see next section).

Data curation

The agalma-analyses/00-catalog.sh script we used to catalog our data for analysis points to local data directories where we curated the new and previously-existing public data.

We provide a couple of resources to help curate data for rerunning analyses on another system:

  • All new data generated in this study can be downloaded directly from the GenBank sequence read archive (SRA) and cataloged in Agalma using the script agalma-analyses/00-import.sh. Note that if 00-import.sh is used to catalog all the data, the IDs for all taxa need to be updated in all other scripts.

  • We provide detailed information on all data included in this manuscript in the table SupplementaryTable1.csv

The directory sra/ includes the scripts we used to prepare our data for upload to SRA. Since the data are already available, there is no need to rerun these scripts. They are provided as a record of how we prepared our data and as a template for others to upload their own data.

Phylogenetic Data

The data/ directory contains all the sequence alignments, tree sets and summary trees resulting from our phylogenetic analyses. Please refer to data/README.md for an explanation of each data file.

Transcriptome assemblies

We have deposited the assemblies we generated for this study at https://bitbucket.org/caseywdunn/cnidaria2014-assemblies/. Public data are available elsewhere (NCBI-EST data base, JGI, NHGRI, FlyBase)