This repository aims to host a set of Python scripts that will simulate transcriptomes with known orthology/paralogy relationships, gene tree topologies, and species tree topologies. It will incorporate a summary statistics script for assessing phylogenetics program performance at the homolog clustering level and at the final tree reconstruction level.
To test, simply run:
python simulate.py -p params.txt
Species and Gene Tree Simulation
Species and gene tree simulation is done based off SimPhy.
Sequence simulation for each gene tree is further done based off indel-seq-gen with root sequence input from assembled transcriptomes from the gastropod project.
Read simulation for each gene sequence, sorted by taxa is done using RNASeqReadSimulator.
Homology clustering is performed using either fablast and mcl or blastp and mcl from the agalma package. Homology assessment has yet to be determined.