This repository aims to host a set of Python scripts that will simulate transcriptomes with known orthology/paralogy relationships, gene tree topologies, and species tree topologies. It will incorporate a summary statistics script for assessing phylogenetics program performance at the homolog clustering level and at the final tree reconstruction level.


To test, simply run:

python -p params.txt

Pipeline Organization

Species and Gene Tree Simulation

Species and gene tree simulation is done based off SimPhy.

Sequence Simulation

Sequence simulation for each gene tree is further done based off indel-seq-gen with root sequence input from assembled transcriptomes from the gastropod project.

Read Simulation

Read simulation for each gene sequence, sorted by taxa is done using RNASeqReadSimulator.

Homology Assessment

Homology clustering is performed using either fablast and mcl or blastp and mcl from the agalma package. Homology assessment has yet to be determined.