This repository contains the code that describes most analyses presented in:

Zapata F, Wilson NG, Howison M, Andrade SCS, Jörger KM, Schrödl M, Goetz FE, Giribet G, Dunn CW. (2014) Phylogenomics analyses of deepd gastropod relationships reject Orthogastropoda. BioRxiv doi:10.1101/007039.

This manuscript is now published in Proc. Roy. Soc. B

On November 25, 2014 all files in this repo were updated with the correct identification for one taxon used as outgroup in our phylogenetic analyses. This taxon was originally identified as Chaetoderma sp. However, the correct name is Pholidoskepia sp.


These scripts require Agalma and its dependencies. Agalma versions 0.3.4 and 0.3.5 were used to run the analyses.

Running the analyses

The analyses are broken into a series of scripts, which are available in the agalma-analyses/ and phylogenetic-analyses/ directories. The script within each of these directories indicates the order that all the other scripts should be run in. The phylogenetic-analyses/ directory also includes a series of python scripts used to generate intermediate files.

All scripts include, as comments, commands for executing the analyses via the SLURM job scheduler installed on the OSCAR cluster at Brown University. If you are running the analyses without a job scheduler, then these SLURM commands will be ignored. If you are using a job scheduler, you will need to edit these commands according to the configuration of your own system.

Is this a fully executable paper?

This manuscript is partially executable. The code explicitly describes how most analysis steps were completed but is not entirely sufficient on its own to re-execute the whole paper. There are several reasons for this:

  • This manuscript was written while we developed Agalma, and different versions of Agalma were used for different steps of the analysis. The command structure of Agalma changed slightly between these versions, so re-executing the entire set of analyses would require editing some commands so that they are all compliant with the most recent version of Agalma.

  • Some basic steps, such as removing taxa from matrices and updating taxon names, were performed manually. These steps are described in the manuscript.

  • Most figures were prepared manually to integrate results of several different analyses.

  • Some third party data, eg 454 reads, were manually preprocessed prior to analysis.

  • The code provided here includes paths to local data files on our cluster. To rerun these analyses on another system, the data would need to be re-downloaded and the paths would need to be updated (see next section).

Data curation

The analyses/ script we used to catalog our data for analysis points to local data directories where we curated the new and previously-existing public data.

We provide a couple of resources to help curate data for rerunning analyses on another system:

  • All new data generated in this study can be downloaded directly from the GenBank sequence read archive (SRA) and cataloged in Agalma using the script analyses/ Note that if is used to catalog all the data, the IDs for all taxa need to be updated in all other scripts.

  • We provide information on all previously published third part data included in this manuscript in the table ThirdPartyData.csv.

  • We provide voucher information on all data included in this manuscript in the table Voucher_Information.csv

The directory sra/ includes the scripts we used to prepare our data for upload to SRA. Since the data are already available, there is no need to rerun these scripts. They are provided as a record of how we prepared our data and as a template for others to upload their own data.

Phylogenetic Data

The data/ directory contains all the sequence alignments, tree sets and summary trees resulting from our phylogenetic analyses. Please refer to data/ for an explanation of each data file.