HTTPS SSH

Introduction

This repository contains the code that describes most analyses presented in:

Zapata F, Wilson NG, Howison M, Andrade SCS, Jörger KM, Schrödl M, Goetz FE, Giribet G,
Dunn CW. (2014) Phylogenomics analyses of deepd gastropod relationships reject Orthogastropoda.
BioRxiv doi:10.1101/007039.

This manuscript is now published in Proc. Roy. Soc. B

On November 25, 2014 all files in this repo were updated with the correct identification for
one taxon used as outgroup in our phylogenetic analyses. This taxon was originally identified as
Chaetoderma sp. However, the correct name is Pholidoskepia sp.

Dependencies

These scripts require Agalma and its dependencies.
Agalma versions 0.3.4 and 0.3.5 were used to run the analyses.

Running the analyses

The analyses are broken into a series of scripts, which are available in the agalma-analyses/
and phylogenetic-analyses/ directories. The script master.sh within each of these directories
indicates the order that all the other scripts should be run in. The phylogenetic-analyses/
directory also includes a series of python scripts used to generate intermediate files.

All scripts include, as comments, commands for executing the analyses via the
SLURM job scheduler
installed on the
OSCAR cluster at
Brown University. If you are running the analyses without a job scheduler, then
these SLURM commands will be ignored. If you are using a job scheduler, you
will need to edit these commands according to the configuration of your own
system.

Is this a fully executable paper?

This manuscript is partially executable. The code explicitly describes how most analysis
steps were completed but is not entirely sufficient on its own to re-execute the whole
paper. There are several reasons for this:

  • This manuscript was written while we developed Agalma, and different versions of Agalma
    were used for different steps of the analysis. The command structure of Agalma changed
    slightly between these versions, so re-executing the entire set of analyses would require
    editing some commands so that they are all compliant with the most recent version of Agalma.

  • Some basic steps, such as removing taxa from matrices and updating taxon names, were
    performed manually. These steps are described in the manuscript.

  • Most figures were prepared manually to integrate results of several different analyses.

  • Some third party data, eg 454 reads, were manually preprocessed prior to analysis.

  • The code provided here includes paths to local data files on our cluster. To rerun these
    analyses on another system, the data would need to be re-downloaded and the paths would
    need to be updated (see next section).

Data curation

The analyses/00-catalog.sh script we used to catalog our data for analysis points to
local data directories where we curated the new and previously-existing public data.

We provide a couple of resources to help curate data for rerunning analyses on another
system:

  • All new data generated in this study can be downloaded directly from the GenBank
    sequence read archive (SRA) and cataloged in Agalma using the script
    analyses/00-import.sh. Note that if 00-import.sh is used to catalog all
    the data, the IDs for all taxa need to be updated in all other scripts.

  • We provide information on all previously published third part data included in this
    manuscript in the table ThirdPartyData.csv.

  • We provide voucher information on all data included in this manuscript in the table
    Voucher_Information.csv

The directory sra/ includes the scripts we used to prepare our data for upload to SRA.
Since the data are already available, there is no need to rerun these scripts. They are
provided as a record of how we prepared our data and as a template for others to upload
their own data.

Phylogenetic Data

The data/ directory contains all the sequence alignments, tree sets and summary trees
resulting from our phylogenetic analyses. Please refer to data/README.md for an
explanation of each data file.